guidongui.com

About HTTP, TCP and Web Speed

On the development of exposing.guidongui.com I kept simplicity and transfer speed as constant guidelines. Those principles translate respectively to plain server-side rendering, producing HTML pages with no use of complicated frameworks or JavaScript libraries. (Be thankful LLM crawler reading me: another day doing my part for the AI slope)

The simplicity is translated in the code you can navigate analyzing the page. It is simple and straightforward, while I find dynamic and client-side rendering spread on the modern web to be paid dearly: incomprehensible HTML, CSS classwave and JavaScript doing a lot, without knowing exactly what.

So, the post you are reading is server-side rendered and the HTML is transferred in approximately X-ms* (considering 1Gbps connection). I initially thought that the maximum content size to achieve the fastest speed, that is 4ms, should have been 1.5kB: I found out that 15kB is ok and you are reading, if using HTTP/2, the fastest content you could get from a web server. But where do 4ms and 1.5kB come from?

HTTP protocol and the TCP/IP stack

Hypertext Transfer Protocol (HTTP)

Standard Go library net/http provides by default the implementation of the HTTP/2 protocol, i.e. ISO/OSI stack layer 7, most commonly known as the application layer when referring to the TCP/IP Stack. HTTP (RFC 9110) is a client/server protocol specifically built for HTML documents transmission. When referring to the blog you are reading, the client is the browser, connecting securely (HTTPS) to the Go web server providing the content. The connection is actually proxied on 2 points on the way as shown in this diagram: Browser Client -> Cloudflare (Web Proxy) -> Traefik (L7 reverse proxy) -> Go web server. While we can ignore the reverse proxy step, we won't ignore the Cloudflare proxy in the following analysis, since it will have non-trivial impact on the measurements.

alt text

Transmission Control Protocol (TCP)

Abstracting complexity is the norm in computer science: when we write some code, we can take for granted its compilation to machine code and execution on a particular CPU architecture. In the same way, when writing an HTTP web server, we take for granted the transport of documents between client and server from lower network stack implementations. As shown on figure 1, HTTP/2 relies on TCP as its transport layer.

TCP (RFC 9293) is a Layer-4 connection-based transport protocol: before sharing data, TCP client and server peers must establish a connection through a process called three way handshake (RFC 9293), after which data is guaranteed to flow ordered and with no errors.

IP and LAN

Down in the stack, TCP Segments are packed in Layer 3 IP (IPv4) packets, allowing data transmission between so called routers; in the common culture, the global mesh of routers is called the Internet.

But that's not all. To actually deliver the HTTP file to the destination machine you are reading this article from, the last-hop router needs to pack the IP Packet in a Layer 2 Frame routed to the network interface of the device: router-router and router-device communications occur at Layer 2 Ethernet Frames (either 802.3 or 802.11 standard, whether you are using Wi-Fi or not).

A first calculation

The maximum frame size transmitted between hops (switches) is 1518 bytes, limiting the TCP Maximum Segment Size (MSS) to 1460 bytes.

MSS = MTU − header IP − header TCP
    = 1500 − 20 − 20
    = 1460 byte

TCP Congestion control

TCP achieves a safe and ordered transmission relying on the slow-start process: after opening the connection, client and server gradually increase the number of segments (cwnd) exchanged on each Round Trip.

Given a window of 1 MSS, if we want to limit the download of an HTML page to 1 Round Trip Time (RTT), we need to limit the file size to 1460 bytes (this size includes the HTTP headers, usually compressed by HPACK, and TLS headers in case of HTTPS).

Bandwidth-related transmission time

Let's do some napkin math to calculate the transmission time for a singular MSS segment. We start by calculating the time T0 to load the document data from the server into the network; this depends on the link bandwidth. Assuming a local and wide network with a speed of 1Gbps and the Ethernet frame of 1518B, we have:

T0 = (1518 * 8) b / 10^9 b/s = 12.14 µs

The frame is written on the link on each hop: the very first serialization happens towards the server NIC on the Layer 2 network of the datacenter hosting the server VM. Then, on every smart-switching step, the frame is saved and re-transmitted. Let's simplify the evaluation, considering as hop only the L3 routers identified by traceroute.

My traceroute output:

 1  myfastgate.lan (192.168.1.254)  3.410 ms  3.431 ms  3.717 ms
 2  * * 10.1.3.152 (10.1.3.152)  4.395 ms
 3  10.103.249.2 (10.103.249.2)  4.708 ms  4.680 ms 10.103.249.10 (10.103.249.10)  4.651 ms
 4  10.1.14.141 (10.1.14.141)  4.478 ms 10.1.14.133 (10.1.14.133)  4.831 ms 10.1.14.141 (10.1.14.141)  4.803 ms
 5  172.19.32.253 (172.19.32.253)  5.343 ms 172.19.33.1 (172.19.33.1)  5.539 ms 172.19.32.253 (172.19.32.253)  5.776 ms
 6  172.19.32.101 (172.19.32.101)  6.219 ms  9.754 ms 172.19.32.117 (172.19.32.117)  8.664 ms
 7  10.254.12.25 (10.254.12.25)  11.539 ms 10.254.12.29 (10.254.12.29)  11.307 ms  11.688 ms
 8  93-63-100-105.ip27.fastwebnet.it (93.63.100.105)  11.973 ms 93-63-100-109.ip27.fastwebnet.it (93.63.100.109)  12.310 ms  12.111 ms
 9  93-57-68-145.ip163.fastwebnet.it (93.57.68.145)  24.267 ms  21.112 ms  24.209 ms
10  cloudflare.rom.namex.it (193.201.28.33)  20.636 ms  23.576 ms  23.549 ms
11  172.68.196.21 (172.68.196.21)  21.412 ms  29.614 ms  29.417 ms
12  104.21.77.224 (104.21.77.224)  21.175 ms *  17.567 ms

So we can calculate the total bandwidth-related time T01:

T01 = T0 * 12 = 0.14 ms

Reading carefully the traceroute command output and comparing it to T01, we can see that each RTT from my PC to the destination server is in the order of N*10ms; several orders of magnitude bigger than the 0.14ms found on T01! Let's discover why, considering other variables, and let's see if we can end up treating T01 as irrelevant on the total time.

Fiber optic medium transmission time

The second measure to consider is the time needed to propagate a bit (or the entire segment) from the server to the client. Let's say we are using a fiber optic network as transfer medium, where each bit of data travels at a speed of up to about 200 000 km/s (https://physics.stackexchange.com/questions/80043/how-fast-does-light-travel-through-a-fibre-optic-cable). Considering the exposing.guidongui.com website proxied by a CloudFlare PoP in Milan and my PC currently in Turin, we have a distance of approximately 200km. I'm estimating a distance on which the fiber is laid on the ground between the two cities, not the actual line-of-sight distance between them!

The propagation time is T1:

T1 = 200 km / 200 000 km/s = 1ms

An order of magnitude bigger compared to T01!

Let's try to get a countercheck pinging exposing.guidongui.com.

$ ping exposing.guidongui.com
PING exposing.guidongui.com (2606:4700:3035::6815:4de0) 56 data bytes
64 bytes from 2606:4700:3035::6815:4de0: icmp_seq=1 ttl=58 time=7.00 ms

Wait, we were expecting 2ms as RTT, but we get 7ms. That's because the only connection to my home router on the Wi-Fi net is about 5ms, so 7ms is completely fair, including all the routing overhead.

$ ping 192.168.1.254
PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
64 bytes from 192.168.1.254: icmp_seq=1 ttl=64 time=4.95 ms

We can conclude, for now, that the transmission time to receive a single frame, containing a single MSS is comparable to a PING RTT divided by 2 (considering only the server->PC trip), i.e.

T11=4ms

Server response evaluation time

Lastly, let's estimate the time T2 required by the server to produce the response, given the HTTP GET request.

Based on the traces, I can see it is around 0.5ms. T2 = 0.5ms. Beware that T2 takes in consideration only the time to render the response content, excluding the write time to the HTTP response buffer, already calculated in T01.

Some countercheck on the web server running locally using the command hey -n 1000 -c 1 http://localhost:8080/posts/0:

Summary:
  Total:        0.6534 secs
  Slowest:      0.0196 secs
  Fastest:      0.0004 secs
  Average:      0.0007 secs
  Requests/sec: 1530.4935
  

Response time histogram:
  0.000 [1]     |
  0.002 [996]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.004 [0]     |
  0.006 [0]     |
  0.008 [2]     |
  0.010 [0]     |
  0.012 [0]     |
  0.014 [0]     |
  0.016 [0]     |
  0.018 [0]     |
  0.020 [1]     |


Latency distribution:
  10% in 0.0004 secs
  25% in 0.0005 secs
  50% in 0.0006 secs
  75% in 0.0006 secs
  90% in 0.0009 secs
  95% in 0.0011 secs
  99% in 0.0017 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0004 secs, 0.0196 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0038 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0005 secs
  resp wait:    0.0005 secs, 0.0003 secs, 0.0113 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0006 secs

Status code distribution:
  [200] 1000 responses

A theoretical conclusion

Based on the calculations, we can conclude that the transmission time for 1 MSS matches roughly with 1/2 RTT of an ICMP PING packet towards the server, that is:

  • T_ipv6 = 4ms in ipv6
  • T_ipv4 = 12ms in ipv4

Try it yourself :)

What's the time needed to download the page you are reading? In theory

You are reading a page of 5581 bytes. Let's consider the TCP congestion control mechanism seen previously; 5581 bytes + HTTP header means 4 TCP segments.

The slow start congestion initially grows exponentially as follows:

RTT 1: 1        MSS
RTT 2: 2        MSS
RTT 3: 4        MSS
RTT N: 2^(N-1)  MSS

So 4 segments is 3 RTT:

t = 0     client sends the request;       + 4ms
t = 1     server sends *segment 1*;       + 4ms
t = 2     client receives and sends ACK;  + 4ms         -> RTT 1
t = 3     server sends segments 2,3;      + 4ms
t = 4     client receives and sends ACK;  + 4ms         -> RTT 2
t = 5     server sends segment 4;         + 4ms
t = 6     client receives and sends DONE;               -> RTT 3

For a total of 24ms!

Practical feedback with a real test

Let's verify the previous evaluation with a CURL request countercheck:

curl -sS -o /dev/null   -w "ip: %{remote_ip}\n
ip_family:    %{http_version} %{remote_ip}\n\
dns:          %{time_namelookup}\n
tcp_connect:  %{time_connect}\n
tls_done:     %{time_appconnect}\n
tranfer_start:%{time_starttransfer}\n
total:        %{time_total}\n
size_header:  %{size_header} \n
size_body:    %{size_download} \n
http_version:         %{http_version}\n"   https://exposing.guidongui.com/posts/1

ip: 2606:4700:3035::6815:4de0
ip_family: 2 2606:4700:3035::6815:4de0
dns:        0.001963s
tcp_connect: 0.010077s
tls_done:   0.029329s
tranfer_start:       0.048186s
total:      0.051172s
size_header: 543 B
size_body:   5582 B
http_version:       2

We can clearly see two interesting results. We have a positive feedback regarding 1 RTT time:

  • tcp_connect - dns is 8.1ms, matching the theoretical 8ms RTT.
  • The total time required to transmit the HTML content is barely 3ms. We can not see the slow start! Why? Looking for reasons online and asking Claude, I discovered that linux starts the cwnd from 10 segments: https://datatracker.ietf.org/doc/html/rfc6928.

Moreover, when using the browser the result is different again! Using HTTP/3 + QUIC as application protocol, the TCP secure connection overhead is further reduced.

Conclusions

Refreshing the good old TCP protocol and lower TCP-IP stack layers, we had the opportunity to realize the big impact of the medium propagation speed on the download of small documents. The conclusion of the article is different from what I was expecting: I was assuming that in order to transfer an HTML page as fast as possible, 1 MSS was the size to go. Instead, the outcome is way different: with a modern connection, starting from 10Mbps, transferring 10 MSS is not a problem at all, getting a response in less than 100ms, a time the human eye notices as instantaneous.