What happens when you type www.xyz.com in the browser and hit enter
At a very high level — The application (browser) tells the OS middleware (network module TCP/IP stack) to connect to a given website. The TCP/IP stack in the OS does the DNS resolution, TCP handshake, TLS handshake and then sends the HTTP GET request to get the content from the server.
PRO TIP — In the case of the TCP/IP stack, each bottom layer protocol serves the layer above it. Below you can see a typical TCP/IP stack implemented in Linux OS (focusing on IP sockets).
In the figure below we can see the flow of a packet within a Linux-based RTOS (Real-Time Operating System) Zephyr². The UDP packet moves from the application level to the hardware level (NIC).
Let’s look at each of the steps in detail.
Step 1 — DNS resolution
Why is DNS resolution required? The TCP/IP stack needs an IP address (source, destination). It doesn’t understand human interpretable hostnames like www. xyz.com. The first step is to resolve the hostname to an IP address.
My blog post discussed the Domain Name System and DNS resolution in great detail.
On a high level, the domain name resolution request goes from browser → OS/browser DNS cache → Public DNS servers → Root DNS server → TLD DNS server → Authoritative DNS server.
At this point, the network module (TCP/IP stack) has the IP address. The destination IP address is checked against the host IP address, and if they are not on the same subnet, the request is sent to the default client gateway. This request may propagate through multiple gateways but eventually lands on an ISP gateway. The gateway will send the request out to the open internet.
PRO TIP — For a TCP/IP connection, five tuples are necessary — SRC IP Address, and DEST IP Address (in IP header), SRC MAC Address, and DEST MAC Address (in Ethernet header), Protocol (in Network layer header).
Throughout the packet’s journey in an IP-based network, the SRC IP, DEST IP remain fixed, they may get encapsulated by other headers, but the SRC & DEST MAC keeps changing based on the next hop. The next hop is determined by the routing information configured on the gateway.
You can see in the figure below that only the MAC address changes as the packet moves from gateway R1 to R4.
Step 2 — TCP handshake
The TCP handshake starts with the client sending a connection request with a sequence number say X. If the sever has resources (port, worker thread, memory, etc.) available to serve the request, it will respond to the client with a server-side sequence number say Y and an acknowledgment to receipt of client’s request X+1. The client responds to the server by incrementing the server-side sequence number by one, i.e., Y+1.
PRO TIP — TCP IP stack identifies a connection by a socket pair(client, server). Socket pair should be unique for every client-server connection.
How can an HTTP server handle multiple requests on PORT 80? Generally speaking, an httpd (http daemon) runs on the web server (to serve HTTP requests) and listens for incoming client connections on a given port (typically 80 for HTTP, 443 for HTTPS). The daemon hands the incoming request to a worker thread (assuming resources like CPU, memory, storage, etc are available to serve the request) to never block the listening port⁸.
Ok, but what happens when multiple connections originate from the same client to a server? Before we answer the question, let’s brush up on some basics. A connection between client and server is identified by a socket pair — client socket and server socket. This combined socket pair should be unique for the TCP/IP stack to identify a connection uniquely. A socket is uniquely identified by the triad of Port, IP address, and Protocol⁹.
Coming back to the question — when multiple connections originate from the same client, the OS assigns a different ephemeral port to each connection. Since the ports would be different, the socket pairs (client socket, server socket) become unique, and hence TCP does not have any issue identifying the connections.
Step 3 — TLS handshake
TLS operates at the session layer (OSI model) and secures (provides confidentiality, authentication, integrity) the connection between the client and the server. TLS version 1.3 & 1.2 are prevalent nowadays; earlier, SSL was standard but because of security vulnerabilities, it is not preferred anymore.
TLS 1.3 significantly improves latency over TLS 1.2 (2 RTT) and can complete handshake in 1 RTT. It also provides 0-RTT for repeat connections compared to 1-RTT for 1.2
The figure below shows TLS 1.3 handshake between client and server. We will look into TLS 1.3 in more detail in another blog post.
PRO TIP — HTTPS = HTTP + TLS (SSL) over TCP
Step 4 — HTTP GET
After the TLS handshake is completed, the browser issues the actual GET request to the server for the resources. The GET request and subsequent communication with the server are all encrypted.
The figure below shows how a user request is translated to an HTTP GET request.
PRO TIP — For TLS 1.3, a new connection⁷ would take 3 RTT (1 for TCP +1 for TLS + 1 for HTTP)+ any time taken to serve the DNS request. If the connection is resumed, it will take 2 RTT(1 for TCP+0 for TLS+1 for HTTP)+ DNS. For a TLS 1.2 handshake, add one extra RTT.
I hope you find this article helpful and stay tuned for my next blog post.
Happy Learning 😎