The Internet, as we know it today is a large system of interconnected autonomous networks. That means a lot of diverse authorities are involved, each playing a certain role. The only common ground is, that computers connected to “the” Internet want to communicate with each other. Or not. Or sort of at least. Oh, and by the way: No, the Internet is not the blue “e” icon on your desktop. And neither the “web” or whatever people think the Internet would be.
I will present the big picture, an overview of the Internet’s architecture. This is going to be a bit simplified and some parts are really superficial but still a valid outline.
About IP Addresses
Communication in the Internet is based on a layered protocol stack. This means, when accessing a particular end system within a network, a lot of protocols build on each other, each adding functionality to the layer standing on shoulder of another protocol.
People having a general idea about the involved techniques might have heard about the Hyper Text Transfer Protocol (HTTP), a popular example of an application layer protocol responsible to access web sites (including mine). This application layer protocol builds on top of the Transmission Control Protocol (TCP), a transport layer protocol responsible to communicate between hosts. TCP builds again on top of the Internet Protocol (IP) which interconnects distinct hosts together. That means the network layer provides end-to-end connectivity. The remaining, lower layers aren’t interesting here. You see, a lot of protocols are involved, each providing a certain functionality. From its logical perspective, each protocol communicates with its other end on the same layer. In reality, every packet steps down layer to layer, each adding its own header and perhaps trailer. Once on the wire, and reaching its destination the same applies vice versa, every the packet goes back, up to the highest layer, until it appears in the remote server application – for HTTP this is a web server.
While the Internet is a network of cooperating networks, as I’m going to point out below, this still needs an unique identifier to address an individual host. I briefly introduced the network layer with its IP addresses (yup – those funny numbers you might see every once in a while) already. Those identify a particular host within this inter-network of networks. The IP address is used for both, as identifier (which host to address) and as a locator (where to find this host) of a given host. Therefore the IP address space is hierarchical. Every end system having such a unique IP address lies within a subnet, which has been previously assigned to an Internet Service Provider by a Regional Internet Registry. Such a subnet is a partition of the global IP address space, a fraction of it. This is what I meant by locator – to reach a particular host within the Internet, another end-point needs to know where to find it. To achieve this, a route must be found to the destination network – this is done by addressing an end point by the (sub-) network it belongs to (remember: a network of networks).
22.214.171.124/16 is an example of such a subnet in its Classless Inter-Domain Routing (CIDR) notation. That means, in this case, the first sixteen bits of the address form the network mask, whereas the remaining parts identify particular hosts. So this particular subnet ranges from
126.96.36.199 (including host mask and broadcast, the first and last address). Within the global address space, the whole subnet is threatened as equal, it is addressed by the netmask only. Only on the last router, the one directly connected to this given network, the particular host address makes a difference. This is used for efficiency, as large fragmentation leads to big memory consumption on the routers, so people want avoid to small subnets. If an autonomous system advertises such a subnet it promises to be final destination for it, or at least it will forward it further.
About Autonomous Systems and Networks of Networks
Ok, I already introduced the Internet as a network of networks. So far I am steadily repeating myself. Now, what makes “the” Internet such an interesting thing? My statement claims there is no such thing, like “the” Internet. Indeed the word “Internet” already denotes this, as it is abbreviated from the word “Internetwork”. Therefore “the” Internet is a complex, loosely hierarchical system of autonomous networks, being interconnected in a chaotic, partially meshed manner with the Internet Protocol gluing all together.
I will leave the historic aspects of the Internet aside here. You can read them up on many locations – use the Internet to find out; what else. Only one thing is worth to be mentioned here: The origins of the Internet lie within the (historic) requirement to link previously existing heterogeneous networks to exchange messages. This fundamental requirement has not changed until today, it is the same anarchistic principle these days, as if it were 1969: Imagine you would like to become a service provider.
As precondition you have some interconnected computers. These days you are most likely using Ethernet to do so – however you are completely on your own to achieve this connectivity. Why not use pigeon carrier? Congratulations – you can consider yourself an autonomous system now. The term “autonomous” means, you are on your own. You have the authority to control your network, you can control what you let in, what you let out and which protocols and techniques you do use. There is one limitation though: In order to establish connectivity with other networks your network must be capable to run the Internet Protocol. Moreover this requires you to have a global routable address space which identifies your portion as soon as you want to link it to the Internet.
Next go and admit for an Autonomous System Number. This is no more than an unique number you associate with your network. Other networks will identify you by using this number. Now, being a service provider, you would like to advertise yourself to the outside world. To achieve that you begin to exchange your connectivity with other, neighbor networks. Metaphorically speaking you tell your neighbor something like “my dear friend, would you mind telling me which IP subnets you do reach. Oh and by the way I’d like to tell you about my subnet“. That was it, as soon as you advertised your networks to others, other networks can reach you, and since you learned others routes you can reach them as well.
This is a bit abstract, but in fact it is as easy as that. You do no more than plugging a cable from one of your network devices to one of your adjacent neighbor networks. Those devices are so called routers, they forward packets between different IP subnets. Ever heard of network exchange points? Those are possible rendezvous points for such adjacent networks (not the only one of course!). Moreover this is a bit simplified, in reality this will require quite expensive investments, such as cabling fiber wires or leasing dedicated lines and so on. Oh and by the way: People won’t happily await you to peer with them. Most likely you have to pay for transit traffic since advertising a route also includes to actually transfer packets for and from your network to others. This costs money. Obviously. That means most likely, you need to pay your neighbor network to provide other networks connectivity to you on your behalf.
About Shape and Topologies
In the wide area, the Internet consists of several thousand of those autonomous systems, each announcing one or several IP subnets that belong to their networks. In the end this results in a conglomerate, now finally being called “Internet”. All the magic is now to determine how to get from A to B. This is trivial if you are directly connected, you know, the shortest path between to nodes is a straight line. However this is more complicated, if you must consider transitive nodes, like “reach A over B over C over D”. You see, in the Internet, the scale is larger – currently there are 690490 routing entries known. Yes, this means there do exists more than half a million IP subnets in the current Internet and you must find your route to every single one. In the Internet every edge I mentioned here literally as “A”, “B”, “C” and so on is a full autonomous system, i.e. a full network identified by its AS number.
The above diagram illustrates a fictional model of the Internet (well, probably a small fraction thereof, still having a valid shape though). It consists of several autonomous systems. Remember each of those nodes represent an entire network, not a single host. The vertexes indicate peerings between those systems. AS 2 is obviously an Internet access provider, that is a company providing individuals access to the Internet. Most likely the ISP will connect its clients through a Point to Point protocol (PPP). Details won’t matter here – in the end the end the system will assign the client a temporary IP address (
188.8.131.52 in the example) and certain services, like routing to other end systems. For this purpose the ISP uses the IP subnet that was assigned to it –
184.108.40.206/16 in this case – being one of the 690k prefixes known to date.
On the right corner I indicated another AS, a service provider (maybe yours form above?). It is identified by the ASN 1 and runs three routers, two on the edge of the system (ER2 and ER3) and one on the core (EC1). Representing actual end systems I denoted a single server which is identified by the IP
220.127.116.11 – belonging to the service provider subnet
Ok, Internet routing is a bit complicated. The question is now, how does a packet, being sent from the client find its destination, the server the address
18.104.22.168 in my example. Some fundamentals have already been introduced, including the most fundamental one: the Classless Inter Domain Routing and the basic functionality of a router. A router must not only learn routes from adjacent peers, it must also share them among other routers in the system. Remember, within a single AS, one entity controls the hardware. Therefore there are many ways to propagate advertised routes across the own network. You could even configure static routes manually. Most likely the service provider will run an AS-internal routing protocol though, called IGP (an abbreviation for Interior Gateway Protocol). A dynamic routing protocol builds a routing table automatically and there are many possibilities available, so IGP is a concept not a particular protocol. Such a dynamic routing protocol will determine routes and the best paths between nodes.
You can imagine what the problem is: Having a large bunch of routes, routers and IP speaking devices it becomes really complicated and time-consuming to keep all routing tables up to date by hand. To come over this problem people deploy dynamic routing protocols. Using those, routers determine themselves the routes they can reach. Examples for such protocols include OSPF (Open Shortest Path First) among other choices, all being used as possible IGP.
To communicate between Autonomous Systems you are not so flexible. You must use whatever other people use: The Border Gateway Protocol (BGP). BGP is a so called path vector protocol (won’t go too much into detail here), the quintessence of it is a single word: Policies. This means between autonomous systems the routing decisions aren’t taken by efficiency or best choice (like IGPs do), but by policies. Those are local decisions, based on economic or political reasons.
In either way the end result is a IP forwarding table, a lookup table which determines the port to choose to forward a particular packet. Every IP speaking machine has such a thing – you do as well right now, since you can read my post being on-line available only. Most likely you do, like most clients connected to the Internet do, have a rather simple routing table (on Linux and Windows, type
route to see yours!). Chances are you have a default gateway, this means – you will forward every packet to a single router living upstream on the line on your provider’s edge.
For an actual router the routing table is much more complicated, since it won’t use a default gateway in most cases but it has its own entry for every single subnet known in the Internet. However, being complex or simple; such a forwarding table is essential for IP routing and is based on the so called longest prefix match. This means an IP packet arriving with a given destination will be compared against all possible destinations, and the one which matches best, wins.
Another fundamental principle of the Internet is the Edge-to-Edge philosophy. This means, for hosts connected to the Internet, the intelligence lives on the edge. This is the end point of a connection. The network core is reasonably dumb, all you find is some routing and switching gear (even hough a large part of my post was about routing) and from time to time some application layer gateways (for example proxy server and application level firewalls). This makes the Internet scalable and successful because this makes it easy to deploy new services and ideas. A counter example would be a plain old telephone network – could you imagine to deploy a new value added service by yourself? You can’t because, intelligence lives on the core of the network there, not on the edge under your control. In the Internet you can. Everyone having a global routable Internet address can deploy new services and connect to already established services, just by starting a client. Indeed one of the most important philosophies that lead to the Internet was the idea to support multiple types of services for a variety of networks.
On the other hand this also introduces a lot of problems I explained already in previous posts (see here – German). Internet is no good for accountability or data integrity – everyone can forge messages! The protocols won’t protect you from malicious messages unless you protect yourself on the application layer. But isn’t all this freedom what makes the Internet such a sexy place?