The Internet is Broken
Anyone with hands on experience in setting up long haul VPNs over the Internet knows it is not a pleasant exercise. Even if you factor out the complexity of appliances and the need to work with old relics like IPSec, managing latency, packet loss and high availability remains a huge problem on the Internet. Service providers also know this (and make billions on MPLS).
The bad news is that it is not getting any better. It doesn’t matter that available capacity has increased dramatically. The problem is in the way providers are interconnected and with how global routes are (mis)managed. It lies at the core of how the Internet was built, its protocols, and how service providers implemented IP routing. The same architecture that allowed the Internet to cost-effectively scale to billions of devices also set its limits.
Addressing these challenges requires a deep restructuring in the Internet fabric of and core routing – and should form the foundation for possible solutions. There isn’t going to be a shiny new router that would magically solve it all.
The problem is in the way providers are interconnected and with how global routes are mismanaged. It lies at the core of how the Internet was built, its protocols, and how service providers implemented their routing layer.
IP Routing’s Historical Baggage: Simplistic Data Plane
Whether the traffic is voice, video, HTTP, or email, the Internet is made of IP packets. Billions of them travel to destinations carry information. If they are lost along the way, it is the responsibility of higher level protocols, such as TCP, to recover them. Packets hop from router to router, only “aware” of their next hop, and their ultimate destination. Routers are the ones making the decisions about the packets. When a router receives a packet, it performs a calculation according to its routing table – identifying the best next hop for the packet.
From the early days of the Internet, routers were shaped by technical constraints. There was a shortage of processing power available to move packets along their path the “data plane”. Access speeds and available memory were limited, so routers had to rely on custom hardware that performed minimal processing per packet and had no state management. Communicating with this restricted data plane had to be extremely simple and infrequent. Routing decisions were moved out to a separate process, the “control plane,” that instructed the data plane on the next hop.
This separation of control and data planes allowed architects to build massively scalable routers, handling millions of packets per second. However, even as processing power increased on the data plane, it wasn’t really used. The paradigm was, and still is, that the control plane makes all the decisions, the data plane executes the routing table, and apart from routing table updates, they hardly communicate. Getting “feedback” from the data plane was simply out of the question.
A modern router does not have any idea how long it actually took a packet to reach its next hop, or whether it reached it at all. Are neighbors congested? Maybe and maybe not. The router doesn’t even know if it is congested itself. And to the extent it does have information to share, it will not be communicated back to the control plane, where routing decisions are actually made.
Ironically, this limited exchange between the control plane and the data plane was taken to the extreme in OpenFlow and Software-defined Networking (SDN): the separation of control plane and data plane into two different machines. This might be a good solution for cutting costs in the data center, but to improve global routing it makes more sense to substantially increase information sharing between the control plane and the data plane.