Data Communications Overview


Character codes are usually sent between a terminal and computer but also may be sent between terminals or just between computers. Terminals and computers are referred to in the telecommunications industry as Data Terminal Equipment (DTEs). Character code sets differ in the association of letters, digits and punctuation to binary codes as well as size. Older sets tend to be small (2**5 or 32 characters). Examples include Western Union Telex, Western Union Telegraph, United Press International, and TWX. These code sets share a common problem, namely that 26 upper case letters plus 10 digits is greater than 32. The problem was solved by using the two escape codes Shift In (SI) and Shift Out (SO). After the SO code, subsequent received codes were used to display letters and punctuation. When numbers were required, the SI code was sent and subsequent codes were displayed as numbers. If a bit in one of the letter or number codes was corrupted, then it could appear as a SI or SO code. As an aid in detecting this type of error, Western Union telegrams repeated any number sent in the telegram at the end of the message.

In the Sixtys Digital Equipment Corporation used Western Union Teletypes as terminals. These early Teletypes employed a 6-bit (2**6) code which still limited I/O to upper case letters, a few punctuation marks, and digits but at least the SI and SO escape sequences were not required. The New York Stock Exchange also uses a 6-bit code for its automated postings. American Standard Code for Information Interchange (ASCII) was then developed. ASCII is a 7-bit code but it also includes an eighth bit which can be used for error check- ing (a parity bit) or to double the size of the character set. IBM introduced its own code set: Extended Binary Coded Decimal Interchange Code (EBCDIC). It is the only true 8-bit code standard.


To limit cable costs, all telecommunication equipment uses multiplexing. A telegraph provide the simplest example. Morse code employs a series of dots and dashes to represent a character. Dots are short pulses of electrical current compared to the longer dash pulses. Thus, short and long pulses are multiplexed on the same wire (circuit) over time. In modern telecommunications, pulses are usually the same duration and a group of pulses are sent together to represent one character code. A pulse can only be in one of two states but there are many names for the two states. When on, circuit closed, low voltage, current flowing, or a logical zero, the pulse is said to be in the "space" condition. When off, circuit open, high voltage, current stopped, or a logical one, the pulse is said to be in the "mark" condition. A character code begins with the data communication circuit in the space condition (that's right, current only flows between the terminal and computer when nothing is said) and may or may not be in the mark condition over time. If the mark condition appears, a logical one is recorded otherwise a logical zero. Figure 1 shows this multiplexing format.

                start  | <- five to eight data bits -> | stop bit(s)
              0 ----   -  -  -  -  -  -  -  -  -  -  - ---------   Space
                   |   |   |   |   |   |   |   |   |   |   |   |
                   | S | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | S | S |
                   |   |   |   |   |   |   |   |   |   |   |   |
              1    -----  -  -  -  -  -  -  -  -  -  -             Mark

                      Figure 1.  Asynchronous Code Format.

The first data bit is always a mark and called the start bit. It signals the receiving DTE that a character code is coming. The next five to eight bits, depending on the code set employed, represent the character. In the ASCII code set the eighth data bit may be a parity bit. The next one or two bits are always in the space condition and called the stop bit(s). They provide a "rest" interval for the receiving DTE so that it may prepare for the next character which may be after the stop bit(s). The rest interval was required by the old mechanical Teletypes which used a motor driven camshaft to decode each character. At the end of each character the motor needed time to strike the character bail (print the character) and reset the camshaft.

There are six basic steps in receiving a character code. First, to keep track of time, the receiver employs a clock which "ticks." When the line is in the space condition, the receiver samples the line 16 times the data rate. In other words, a data interval is equal to 16 clock ticks. In this way the receiver can determine the beginning of the start bit and "move over" to the center of the bit time for data sampling. Second, when the line goes into the mark state, declare a "looking for start bit" condition and wait one half the bit interval or eight clock ticks. Third, sample the line again and if it has not remained in the mark condition, consider this to be a spurious voltage change and go back to step one. Fourth, if the line was still in the mark state, then consider this a valid start bit. Shift the start bit into an eight-bit shift-register and wait one bit time or 16 clock ticks. Fifth, after one bit time sample the line (the data should have been there for the last eight clock ticks, and should remain for eight more clock ticks). Now shift the sample into the shift-register. Sixth, continue steps four and five seven more times. After the eighth shift, the start bit will "migrate" into a flip-flop indicating character received. Go to step one.

All of this seems simple enough, but for it to work the transmitter and receiver must assume the same values on five parameters. First, both sides must agree on the number of bits per character. Second, the speed or Baud of the line must be the same on both sides. Note that Baud refers to "voltage switches per second." Therefore, the common phrase "baud rate" is incorrect since it refers to voltage switches per second (Baud) per second (rate) or acceleration. Furthermore, the "B" in Baud is capitalized in deference to J. M. Emil Baudot (1845-1903) who invented the Baudot code about 1874. His code was adopted by the French government in 1877. Baud is thought to be a contraction of Baudot. Third, both sides must agree to use or not use parity. Fourth, if parity is used, both sides must agree on using odd or even parity. Fifth, the number of stop bits must be agreed upon. Having said all this, most DTEs today employ eight data bits, no parity, and one stop bit. Thus there is a rule-of-thumb that the number of characters per second is equal to the Baud divided by 10.


Teletypes used "current loop" for an electrical interface and, therefore, so did many of the computers of the sixtys and seventys. Today current loop is no longer used. The Electrical Industry Association (EIA) in cooperation with the Bell System, independent modem manufactures, and computer manufactures developed a standard called RS-232-C to replace the old current loop interfaces. The RS-232-C standard defines electrical signal characteristics, the mechanical characteristics, functional description of the interchange (hand- shaking) circuits, and a list of interchange subsets for various applications.

Electrical specifications. The RS-232-C electrical specification states that the open circuit voltage shall not exceed 30 volts. The RS-232-C driver shall be able to sustain a short circuit without damage and the short circuit shall not exceed 0.5 ampere. The RS-232-C mark condition shall be more than -3 v but no more than -15 v. The space condition shall be more than +3 v but no more than +15 v. The RS-232-C driver must assert the mark condition between -5 and -15 v. The space condition must be asserted between +5 and +15 v. Total driver, receiver and cable capacitance (shunt capacitance) shall not exceed 2,500 picofarads.

The RS-232-C standard has shown three basic limitations. First, DTE power supplies usually provide +5 v, but to use RS-232-C they must also provide +12 and -12 volts. Second, common twisted-pair cable capacitance is approximately 40-50 picofarads per foot which means that 15 m (~50 ft) is the maximum distance allowed between two DTEs or the DTE and DCE. Third, the standard uses a common ground yet only provides a two volt noise margin.

In real life, RS-232-C integrated circuits are much more robust than the standard suggests. Cable has been run up to 1.2 km (~4,000 ft) and still found to work at 1200 Baud. At 9600 Baud, RS-232-C only works up to 75 m (~250 ft). If the RS-232-C cable is run between campus buildings, it tends to become more unreliable since ground potential can shift much more than two volts.

To overcome the limitations of RS-232-C, EIA has produced another standard called RS-422. RS-422 uses operational amplifiers in a "common mode rejection" configuration. Thus the two signal wires are isolated from ground and the receiving operational amplifier only responds to the difference in voltage between the two signal wires regardless of the common potential both wires may be at. Furthermore, operational amplifiers work over a large voltage range and therefore the standard +5 v power supply may also drive the communication circuits. Tests with RS-422 have shown that it will support 100,000 Baud at 1.2 km, one megaBaud at 150 m, and 10 megaBaud at 12 m (~40 ft).

Mechanical Specifications. The RS-232-C connector varies but most often is a DB-25. Probably the second most common RS-232-C connector is the DB-9 used on the IBM PC/AT.

                           CD   SG   DSR  CTS  RTS  RCV  TRN  GND
        \   *    *    *    *    *    *    *    *    *    *    *    *    *  /
         \ 13   12   11   10    9    8    7    6    5    4    3    2    1 /
          \   *    *    *    *    *    *    *    *    *    *    *    *   /
           \ 25   24   23   22   21   20   19   18   17   16   15   14  /
                            Ring      DTR            RCLK RCV2 TCLK TRN2

                           Yellow Orange Red Brown Black
                              SG   DTR  TRN  RCV  CD
                           \   *    *    *    *    *  /
                            \  5    4    3    2    1 /
                             \   *    *    *    *   /
                              \  9    8    7    6  /
                               Ring  CTS  RTS  DSR
                              Grey Purple Blue Green

        Figure 2.  Pin assignments for the DB-25 and DB-9 (IBM PC/AT).

Signal description and protocol. Chassis (protective) ground (DB-25 pin 1). This pin is attached to the metal chassis and, in turn, to the third conductor in the power receptacle. Usually this pin is left UNCONNECTED since it is not part of the data circuit. Furthermore, if the two DTEs draw their power from different and distant power transformers, a ground potential may flow through pin 1 and result in a shock hazard or equipment damage.

Transmitted data (TRN - DB-25 pin 2 or DB-9 pin 3). From DTE to DCE. Data should not be transmitted until the DSR, DTR, RTS, and CTS circuits are all asserted.

Received data (RCV - DB-25 pin 3 or DB-9 pin 2). From DCE to DTE.

Request to send (RTS - DB-25 pin 4 or DB-9 pin 7). From DTE to DCE. Computer indicates to the modem that it wishes to transmit. RTS is usually reserved for half-duplex circuits and the computer has to check with the modem to make sure the line is not busy with an incoming character code.

Clear to send (CTS - DB-25 pin 5 or DB-9 pin 8). From DCE to DTE. In response to RTS, the modem establishes the data flow direction and asserts CTS. CTS also requires that DSR be asserted by the computer and that CD be previously established.

Data set ready (DSR - DB-25 pin 6 or DB-9 pin 6). From DCE to DTE. DSR indicates that the modem has the telephone line "off hook," the line is not in "alternate voice" mode, and dialing requirements, if any, have been completed. When the telephone line is used for voice and data communication, the computer must re-test DSR before reestablishing data communication.

Carrier detect [also known as "received line signal detector"] (CD - DB-25 pin 8 or DB_9 pin 1). From DCE to DTE. Indicates that the local modem is detecting the "carrier" frequency of a remote modem.

Data terminal ready (DTR - DB-25 pin 20 or DB-9 pin 4). From the DTE to DCE. Indicates that the computer will accept telephone calls from the modem. If DTR is not asserted, the modem will not answer the telephone when it rings.

Ring (DB-25 pin 22 or DB-9 pin 9). From DCE to DTE. Indicates the telephone is ringing, but is not usually used. Why not?

Protocol steps. The RS-232-C has six basic steps or phases. First, the modem asserts "ring." Second, the computer decides if it is appropriate to answer the phone and if so, asserts DTR. Third, In response to DTR the modem attempts to say "hello" to the other modem and if successful, asserts CD. Fourth, the computer which wants to transmit asserts RTS. Fifth, if the line is not being used for voice or is half-duplex idle or full-duplex, the modem asserts CTS. Sixth, data is exchanged.

Connection Configurations. The RS-232-C standard is designed to interconnect a DTE and DCE. In this configuration wires are run between the two connectors pin-for-pin. Oddly, many of the RS-232-C connections are between two DTEs rather than between a DTE and DCE. To interconnect two DTEs, a "null modem" is required. There are many possibilities but only a simple version is show here. Figure 3 shows a typical connection between a computer and terminal. Note that the DB-25 pin numbers may change on either side of the arrows. In this example only four conductors are required, TRN, RCV, SG, and one more. TRN and RCV have been defined above and SG provides the signal return path. The fourth conductor directs DTR from the terminal into CD of the computer. Thus when the terminal is turned on, DTR is asserted. Sensing CD, the computer generates a "login" message.

                           DB-25           DB-25
                           Computer        Terminal
                           DTE             DTE
                           TRN 2  -------> 3 RCV
                           RCV 3 <-------  2 TRN
                           RTS 4           5 CTS
                           CTS 5           4 RTS
                           DSR 6           6 DSR
                           SG  7 <-------> 7 SG
                           CD  8 <------- 20 DTR
                           DTR 20          8 CD

              Figure 3. Typical computer - terminal connection.

                           DB-9            DB-9
                           Computer        Computer
                           DTE             DTE
                           TRN 2  -------> 3 RCV
                           RCV 3 <-------  2 TRN
                           CD  1 <--   --> 1 CD
                                   |   |
                           DTR 4  --   --  4 DTR
                                   |   |
                           DSR 6 <--   --> 6 DSR
                           SG  5 <-------> 5 SG
                           RTS 7  --   --  7 RTS
                                   |   |
                           CTS 8 <--   --> 8 CTS
                           RNG 9           9 RNG

              Figure 4. DB-9 computer - computer connection.

Programmers View. Serial line computer interfaces vary widely. The earlest Apple computers provided the most sim- plistic interfaces. The programmer viewed three I/O regis- ters: a "configuration switch" register, a input bit regis- ter, and an output bit register. The configuration register determined bits per character, parity, odd or even parity, and Baud. The programmer was responsible for assembling the character bit-by-bit from start to stop bits.

Today, personal computer serial line interfaces employ a Universal Asynchronous Receiver and Transmitter (UART). The UART performs the character assembly algorithm and presents status flags to the programmer such as "character received," "transmit buffer ready," "parity error," "input buffer over- flow," and "interrupt enable." The computer interface also supports the RS-232-C modem bits such as DSR, Ring, CTS and CD. Programmers input and output characters by reading and writing the interface data registers.

Minicomputers and mainframes support serial interfaces at the Direct Memory Access (DMA) level. In DMA I/O, the programmer establishes a buffer which may contain data or de- vice configuration parameters. Then a command is given to the interface to read or write the memory buffer without CPU intervention. Upon completion of the buffer reading or writ- ing, the serial interface asserts an interrupt indicating that the buffer transaction has completed.