
The TCP/IP protocol suite was developed by the Department of Defense to allow computers on different networks designed by different vendors to share resources across a common internetwork. TCP and IP are only two of the protocols in this suite, but because they're the best known, the entire family is commonly referred to as TCP/IP. That's something of a misnomer; the most accurate name for this set of protocols would be something like "Internet Protocol Suite." The protocols supporting traditional services like file transfer, electronic mail, and remote login were completed by 1974. They were readily accepted because they delivered these basic services across a large number of client and server systems. Typically, individual machines are first connected to a LAN (Ethernet or Token Ring). TCP/IP shares the LAN with other uses (e.g., a Novell file server or Windows for Workgroups peer systems). One device provides the TCP/IP connection between the LAN and the rest of the world.
However, TCP/IP is not just another one of those arcane telecommunication terms floating around. It is just as invisible to you as your brain, but anytime you communicate with any other node on the Internet, it is a sure guarantee that the transaction is being managed by TCP/IP. It is the Internet. If you understand how TCP/IP works, you'll understand how the Internet works.
TCP/IP's history simultaneously explains how it works and why the Internet functions the way it does. For decades computer designers had agreed that sharing computing resources between computers was inherently more efficient than duplicating the same resources on machine after machine. This generic concept took on considerable urgency at the height of the cold war. In the mid-1960s the Department of Defense contracted the Rand Corporation to replace its command-and-control communication system. Interdependent communication hubs were chained together at that time, and if any one hub were destroyed in a nuclear "event," the entire network would fail. In 1964 a researcher at Rand, Paul Baran, designed a computer-communications network that had no hub, no central switching station, and no governing authority. It assumed that no one link in the network was reliable, i.e., would survive an attack. In Baran's scheme, each message was cut into strips of data and stuffed into electronic envelopes, or packets, and marked with the address of the sender, receiver, and the relative position of each data-strip in the message. The packets were then released into the web of interconnected computers and reassembled when they reached their destination. If any packets were missing or corrupted, the message was retransmitted. A packet that was undeliverable via one network in the web was switched to another. This packet-switching scheme is the electronic foundation of data transmission on the Internet today.
The TCP/IP-managed packet-switching system proliferated widely in North American networks for very good historical reasons. In the late 1960s and early 1970s, computing centers operated autonomously, as if no other sites existed. In those days, networks were not designed to allow resource sharing between users in differing networks. To login to or share files with remote computers, modification of software and files was required. Users had to be trained for each different computer. To help remedy this situation, the Department of Defense had created the Advanced Research Projects Agency (ARPA). Its original mission included providing a communication network between computers that--besides surviving a nuclear attack--permitted remote logins to distant computers, file sharing, and--though not in ARPA's original plan--intersite electronic mail. ARPA's network was named the ARPANET. Another of ARPANET's objectives was to permit access from general-purpose computers (i.e., what we now call desktops) to computers specializing in list processing, information retrieval, etc. (i.e., what we now call servers). These are the roots of the client/server paradigm, file transfer (FTP), Telnet, and e-mail.
ARPA conducted a lot of research for the Department of Defense in the late 1960s, but a 1968 experiment was especially critical to the nature of the Internet. This experiment focused on tying together geographically dispersed dissimiliar computers, which were called "hosts" because they housed data and computational resources, and attached terminals, the "guests." If different types of computers were to communicate with each other, they needed to share common communication standards. That common standard, TCP/IP, was developed in 1973-1974 around the packet-switching system. ARPA, which was renamed DARPA in 1972 to better reflect its defensive posture, decided to implement TCP/IP around the Unix operating system and to publicly distribute TCP/IP code through the University of California at Berkeley. Because the code was nonproprietary, it spread rapidly among universities, private companies, and research centers. By 1983 DARPA stated that all computers connected to the ARPANET were required to use TCP/IP. It is now the standard suite of data communications protocols for Unix-based computers, such as the SIUC CWIS server. In the mid-1980s, the National Science Foundation (NSF) began to fund the proliferation of ARPANET technology to many universities. Demands on the ARPANET gradually exceeded its capabilities, and it was disbanded in 1990 and replaced with the NSFNET "backbone," which you and I "see" today as the Internet.
Now that we've told you where TCP/IP came from and how pervasively it's woven into the Internet, let's take a closer look at how it actually works.
The Internet has been called the "granddaddy" of computer networks. The "great-granddaddy" of networks must therefore be the Internet's father, ARPANET. But the Internet is neither new nor old; it's merely the latest in a long line of attempts to fill a need to communicate across distances that exceed the limitations of human senses. Our species has never had a way to transmit data "as is" from Point A to Point B. Words and numbers simply won't fit through wires or fly far enough through air. Physically carrying information to point B is expensive and time-consuming, so over the generations, our predecessors have hit upon several variations on the same basic theme: encoding information (we'll call it data hereafter) at a sending site; sending the encoded data through whatever medium suits it; and then decoding the data at the receiving site. For centuries African peoples communicated across great distances with drumbeats. American indians used smoke signals. Ships at sea communicate via semaphore flagging, submarines use sonar. Samuel F. B. Morse used alternating pulses of electricity--dots and dashes--in his telegraphs.
Even though the transmitting energy source varies drastically in those examples--from sound to light to electricity--and the media the energy flows through varies from air to water to copper wire--they all share a common need for matching encoding and decoding schemes at sender and receiver. An uninterrupted data stream or a randomly interrupted data stream is useless. The data stream must be divided into mutually agreed-upon units, whether they are drumbeats, puffs of smoke, or the packets of binary digits the TCP/IP protocols manage.
All digital computers operate by opening or closing switches with tiny pulses of direct current called "bits," which is an abbreviation for "binary digit." You and I open and close switches with a human digit, probably an index finger. But that light switch on your wall and all the millions of miniaturized switches in your desktop computer share a common fact of life: they are either open or closed. All digital computers do is open and close these switches in your computer's circuits, which is very simple, but they do it very, very fast.
This generic description applies to any digital computer, whether it is connected to a network or not. Let's say you're accustomed to using a standalone computer. The Central Processing Unit (CPU) in your computer manages all those millions of switch settings with tremendous speed and accuracy, shuttling masses of them onto your monitor screen, onto a disk drive and back again, and myriad other tasks. The position of every switch setting--every bit--must be preserved in its exact position. This sounds like chaos, but your desktop PC actually encloses a relatively stable world because the hardware and software inside are compatible with each other.
Now, let's say your standalone computer has been connected to a B-jack or a modem, and you want to retrieve or send data over the network. Those same bits that were orchestrated so precisely inside your computer must now leave their tightly controlled world. They must pass through who knows what "jungle" of extraordinarily varying hardware and software to get to a destination address. It's literally impossible to know whose circuitry your data will pass through; routing pathways change from minute to minute. That was at the heart of the packet-switching scheme from the beginning. Some software in your computer must prepare data for delivery to its Ethernet/Token Ring card or modem and for eventual passage through this "jungle" of networks. Some software must package all those bits in such a way that they can be safely handed off to the network and delivered intact no matter what machinery they pass through. Networkers used to have to do this manually.
But now TCP/IP handles it. A protocol within TCP/IP splits a given unit of data, such as an electronic letter or a file being retrieved, into packets that can be switched from router to router to avoid failed or busy nodes and channels. Other protocols place protocol control information (PCI) headers around them, until the resulting "datagrams" can be routed through the network as independent entities. Let's take a closer look at how this is done.
The wide range of functions operating on TCP/IP networks is far too complex to be handled by a single "layer" of functions. TCP/IP, like all communications protocols, is layered. This means that each layer is responsible for fewer tasks and that each layer doesn't need to know what its neighboring layers know. Transactions with the Internet are divided into four protocol layers. These layers are dispersed throughout the sending, routing, and receiving computers' hardware and disk files. However, human beings like visual metaphors, and so these layers are commonly spoken of as a protocol "stack."
TCP/IP-managed packet switching is a "store-and-forward," "connectionless" scheme. Packets for DataTransaction One are pushed out into the network, and Data Transaction Two is immediately begun, even before verification has come back that One was successful. Think of its opposite, a telephone call. A switched phone call requires that a number be dialed and that all resources be set up before the call can be completed. And that circuit remains unusable to anyone else as long as the connection remains closed. The Internet would bog down instantly if it used that type of switching.
The overall purpose of this scheme is to achieve a high degree of independence from specific protocols on the subnetworks the router must communicate with. The routers between networks are designed to be "transparent" to host computers on a subnetwork. Data units should flow through the router as transparent, independent entities. Storing the communication protocols on the hosts--that's you--frees up the router to concentrate on fewer tasks, such as managing the traffic between subnetworks. That's why we've told you you must have a communication protocol manager, such as Wollangong, Trumpet, or the one built into Windows 95, installed on your computer. It can't concern itself with application-level protocols (such as e-mail), which may or may not be proprietary. In fact, at least ideally, it shouldn't even care what kind of subnetwork is talking to it. Imagine a bucket brigade of strangers putting out a fire in the dark--you don't need to know who is passing the next bucket of water to you or even what is in the bucket, all you recognize is the bucket handle.
The bottom layer is the Subnetwork protocol, such as Ethernet (for Macintosh or Unix machines) or Token Ring (for DOS/Windows machines). This is the hardware component of the stack.
The next step up is TCP, which is software comprising the Service Provider Protocol layer. The TCP programs must be running on your machine and on any machine it needs to communicate with. TCP detects errors or lost data and triggers retransmission until the data is correctly and completely received. TCP is responsible for breaking up messages into datagrams and reassembling them properly. Some messages fit into a single datagram, so it is pointless to fire up a complex suite of TCP protocols to break it up! In those cases, the User Datagram Protocol (UDP), which resides side-by-side with TCP in this layer, is used to transmit data from source to destination, but without the error-checking and delivery-verification capabilities of TCP. Each technology has its own convention for transmitting messages between two machines within the same network. On a LAN, messages are sent between machines by supplying the six-byte unique identifier (the "MAC" address). In an SNA network, every machine has Logical Units with their own network address. DECNET, Appletalk, and Novell IPX all have a scheme for assigning numbers to each local network and to each workstation attached to the network. On top of these local or vendor-specific network addresses, TCP/IP assigns a unique number to every workstation in the world, the "IP number."
TCP sends each of these datagrams to the next step "up," the Internet Protocol (IP), which is responsible for routing data node by node to its destination. IP operates on gateway machines that move data from department to organization to region and then around the world. It provides routing from the department to the enterprise network, then to regional networks, and finally to the global Internet. IP routers make individual decisions about where to send the datagram. A router may be limited to a simple "clockwise" algorithm (i.e., always New York to Atlanta to Los Angeles to Denver to Chicago), or it may be capable of such sophisticated techniques as measuring data traffic and choosing the least-busy link. The header that TCP attached to each datagram must contain the Internet address of the destination computer. IP doesn't care about what's in the datagram or even what's in the rest of the header. It does add its own header so that gateways and other intermediate systems can forward the datagram. It's job is simply to find a route for the datagram and get it to the other end. It connects networks and routers into a coherent system, i.e., smoothing the way, managing a seamless, transparent pathway for data from the source to the final destination. They automatically reconfigure themselves when something goes wrong.
The upper layer is the Application Layer. This supports the direct interfaces to such user applications as file transfer (FTP), remote terminal access (Telnet), and electronic mail (e-mail).
A typical transaction works like this. Think of yourself as Host A, and the requisite TCP/IP and client software on your desktop computer. You are going to request some service from a server, which we'll call Host B. Let's say you have an application such as an FTP client in the top layer of your protocol stack (such as WS_FTP, a Windows-based client we include in our CWIS Suite), and you wish to retrieve a file from Host B. FTP sends its request "down" to the protocol running in the layer beneath it, the Transport Layer.
This layer performs a variety of functions on the unit of data passed to it, including adding a header. (Each layer adds its own unique header.) This unit of data is now called a "segment." The transport layer passes the segment down to the Network Layer, also known as the IP layer, which performs specific services on the segment and appends its own header. The data unit, which is now known as a "datagram" in Internet terms, is passed to the lower layers. The Data Link Layer adds its header and a trailer. The data unit, now called a "frame," is then launched into the network by the physical layer (whatever hardware, i.e., network card, you have in your computer).
This frame of data passes from your subnetwork through the router (or routers) into the subnetwork Host B resides on, and then into Host B's own protocol stack. The "downward" movement described above is reversed as the data frame enters the host, which we're calling a server. That is, it enters the host's physical layer--its network card--and works up through the network layer, transport layer, and ends with the application layer, where a response to Host A--your client--will be generated. Each layer's operations are governed by instructions in the header addressed to it. Each layer strips off and examines the data unit's header to determine the actions it is to take. Eventually the data unit created by the file transfer application (FTP in our case) arrives at the file transfer application residing on the host (also FTP). The server on Host B would then return the retrieved data (a file) to the client at Host A by transferring the data down through the layers in its protocol stack, through the subnetwork, through the router, into the next subnetwork, and up the layers of Host A's protocol stack.
Other protocols handle specific tasks, e.g., transferring files between computers, sending mail, or finding out who is logged in on another computer.
These application protocols include the "big three." First, FTP allows a user on any computer to get or send files from or to another computer. FTP handles file transfer between machines with different character sets, end-of-line conventions, etc. Second, the network terminal protocol (telnet) allows users to log in to any other computer on the network. You're really still talking to your own computer, but telnet makes your computer invisible while it is running. When you log off the other computer, the telnet program exists, and you find yourself talking to your own computer. Third, e-mail allows you to send messages to users at other computers. Typically, mail is stored on a server until you and your e-mail client are ready to read it.
Besides these traditional services, TCP/IP can also provide accessing other network file systems (NFS), remote printing, remote execution, and other services we'll not discuss here. (It even includes a protocol [ICMP] for sending error messages about itself!)
The network has become as important as the computers it connects. Those of you already familiar with operating nonnetworked "standalone" computers will need to become familiar with networking terminology. (We began this document with a link to a glossary.) The software you'll need is available from the SIUC CWIS home page.