Foundations of Distributed Systems

January 7, 2026

Note: All code examples are in Rust. You should be comfortable with basic Rust: ownership, borrowing, structs, enums, Result, and match. If terms like impl or &mut self are unfamiliar, work through Rust Book concepts, I've already have notes on those.

How This Document Is Organized

This guide has four distinct parts:

Part What It Covers Why You Need It
Part A: Prerequisites Networking fundamentals, how computers communicate You can't understand "the network is unreliable" without knowing what a network actually is
Part B: Distributed Systems Concepts The actual distributed systems theory — fallacies, failures, time, ordering This is the core content

Suggested approach:

  1. Read Part A if networking is new to you (or skim if you're familiar)
  2. Read Part B carefully — this is the main part (code examples are embedded with concepts)

PART A: PREREQUISITES

Networking Fundamentals & Rust Basics

Skip this part if: You already understand TCP/IP, can explain what happens when you visit a website, and know what ports are.


A.1: What is a Distributed System?

A distributed system is a collection of independent computers that appears to its users as a single coherent system.

When you use Google Search, you don't think "I'm querying thousands of machines across dozens of data centers." You just see a search box and get results. That's the magic, and the challenge, of distributed systems.

But let's make this concrete. What does "computers talking to each other" actually mean?

What's Actually Happening (The Physical Reality)

When your computer sends data to a server, here's the physical reality:

  1. Your computer converts data into electrical signals
  2. Those signals travel through cables (ethernet, fiber optic) or radio waves (WiFi)
  3. They pass through multiple intermediate devices (routers, switches)
  4. Eventually they reach the destination computer
  5. That computer converts the signals back into data

Key insight: At the physical level, all you can do is send electrical pulses. Everything else (files, messages, web pages) is abstraction built on top of this.

The Communication Problem

Imagine you can only communicate by flashing a light on and off. How do you send the message "Hello"?

You need to agree on:

  1. Encoding: What patterns mean what? (Like Morse code)
  2. Timing: How long is a dot vs. a dash? When does one letter end and the next begin?
  3. Error handling: What if the other person blinks and misses a flash?
  4. Addressing: If multiple people are watching, who is this message for?

Networks solve all these problems through protocols: agreed-upon rules for communication. The internet is built from layers of protocols, each solving different problems.

Layers of Abstraction

Computer networking is built in layers:

┌─────────────────────────────────────────┐
│  Application Layer (HTTP, SMTP, etc.)   │  ← "I want to load a webpage"
├─────────────────────────────────────────┤
│  Transport Layer (TCP, UDP)             │  ← "Deliver this data reliably"
├─────────────────────────────────────────┤
│  Network Layer (IP)                     │  ← "Route this to the right machine"
├─────────────────────────────────────────┤
│  Link Layer (Ethernet, WiFi)            │  ← "Send bits to the next hop"
├─────────────────────────────────────────┤
│  Physical Layer                         │  ← Actual electrical/light signals
└─────────────────────────────────────────┘

Why layers? Each layer only talks to the layers directly above and below it. This means:

Analogy: Sending a letter.

You don't think about the trucks when writing. The layers hide complexity.


A.2: IP Addresses and Packets

IP Addresses: How Machines Are Identified

Every device on the internet has an IP address: a unique identifier, like a phone number for computers.

IPv4 addresses look like: 192.168.1.1 or 142.250.80.46

IPv6 addresses look like: 2001:0db8:85a3:0000:0000:8a2e:0370:7334

Special addresses you'll see often:

When you type google.com in your browser, your computer first asks "what's the IP address for google.com?" The answer might be 142.250.80.46. Then your computer talks to that IP address. The translation from name to IP is done by DNS (Domain Name System): itself a massive distributed system.

Packets: Data Travels in Chunks

Here's something that might surprise you: when you download a 1GB file, it doesn't travel as one continuous stream of 1 billion bytes. Instead, it's broken into thousands of small packets, typically around 1,500 bytes each.

Why packets?

  1. Sharing: Imagine a single phone line between two cities. If one person made a call that lasted an hour, no one else could use the line. But if we break conversations into tiny chunks and interleave them, many conversations can share the same line. That's what packet switching does: your Netflix stream, your neighbor's video call, and a business's database queries all share the same wires, packet by packet.

  2. Reliability: If you sent 1GB as one continuous stream and there was an error at byte 999,999,999, you'd have to resend the entire gigabyte. With packets, you only resend the one packet that failed.

  3. Routing: Different packets can take different paths. If one route is congested, packets can go around it.

What's in a packet?

Every packet has two parts: a header (metadata) and a payload (actual data).

┌──────────────────────────────────────────────────────┐
│  HEADER (metadata about this packet)                 │
│  ┌─────────────────┬──────────────────────────────┐  │
│  │ Source IP       │ 192.168.1.100                │  │  ← Who sent this
│  │ Destination IP  │ 142.250.80.46                │  │  ← Where it's going
│  │ Protocol        │ TCP                          │  │  ← How to interpret payload
│  │ Length          │ 1420 bytes                   │  │  ← How big the payload is
│  │ Checksum        │ 0x3a7f                       │  │  ← For error detection
│  │ TTL             │ 64                           │  │  ← Hops before packet dies
│  │ ... more fields │                              │  │
│  └─────────────────┴──────────────────────────────┘  │
├──────────────────────────────────────────────────────┤
│  PAYLOAD (actual data being sent)                    │
│  "GET /index.html HTTP/1.1\r\nHost: google.com..."   │
│  (up to ~1460 bytes for typical internet traffic)    │
└──────────────────────────────────────────────────────┘

Think of it like a postcard: the address side is the header, the message side is the payload.

Routing: How Packets Find Their Way

Your packet doesn't teleport directly to its destination. It hops through multiple routers: devices whose job is to forward packets toward their destination.

Your Computer 
    ↓
Your Home Router (192.168.1.1)
    ↓
ISP's Router
    ↓
Regional Hub
    ↓
... (5-20 more hops typically) ...
    ↓
Google's Edge Router
    ↓
Google's Server (142.250.80.46)

Each router looks at the destination IP address and decides: "based on my routing table, which of my neighbors is closest to this destination?" Then it forwards the packet that direction.

You can see this yourself! In a terminal, run:

# On Mac/Linux:
traceroute google.com

# On Windows:
tracert google.com

You'll see each hop your packets take, with timing for each. This is invaluable for debugging network issues.

What Can Go Wrong at the IP Layer

Here's the crucial thing: IP provides no guarantees. It's "best effort" delivery.

  1. Packets can be lost: A router gets overloaded and drops packets. A cable gets damaged. Radio interference corrupts a WiFi packet. The packet just... disappears.

  2. Packets can arrive out of order: Packet 3 might take a different route than packets 1 and 2, and arrive first.

  3. Packets can be duplicated: A router isn't sure if it forwarded a packet successfully, so it sends it twice.

  4. Packets can be corrupted: A cosmic ray flips a bit. Electrical interference on a wire changes a 1 to a 0. (Checksums catch most of these, but not all.)

  5. Packets can be delayed: Traffic jam at a router. Your packet waits in a queue.

This is why distributed systems are hard. The network will fail. The question is how you handle it.


A.3: TCP: Making Communication Reliable

The Problem

IP gives us the ability to send packets between machines, but with no guarantees. For many applications, that's not good enough:

TCP: Transmission Control Protocol

TCP is a protocol built on top of IP. IP handles getting individual packets to the right machine. TCP handles making that communication reliable and ordered.

TCP provides:

  1. Reliable delivery: If a packet is lost, TCP automatically resends it
  2. Ordering: TCP reassembles packets in the correct order, even if they arrive out of order
  3. Flow control: TCP slows down if the receiver can't keep up
  4. Congestion control: TCP slows down if the network is overloaded

How TCP Knows When Packets Are Lost: Acknowledgments

The key mechanism is acknowledgments (ACKs). When the receiver gets data, it sends back a message saying "I got it."

Sender                           Receiver
   │                                │
   │──── Data: bytes 0-999 ────────>│
   │                                │  Receiver got it!
   │<─────────────── ACK 1000 ──────│  "I've received up to byte 999,
   │                                │   send byte 1000 next"
   │                                │
   │──── Data: bytes 1000-1999 ────>│  (this packet gets lost!)
   │                                │
   │     ... sender waits ...       │
   │     ... no ACK comes ...       │
   │     ... timeout! ...           │
   │                                │
   │──── Data: bytes 1000-1999 ────>│  (resend!)
   │                                │
   │<─────────────── ACK 2000 ──────│  "Got it, send byte 2000 next"

Key insight: TCP uses timeouts to detect loss. If no acknowledgment arrives within a certain time, TCP assumes the packet was lost and resends it. This is why "timeout" is such an important concept in distributed systems.

Sequence Numbers: Handling Ordering and Duplicates

Each byte of data in a TCP connection has a sequence number. This solves two problems:

  1. Ordering: If packet with bytes 2000-2999 arrives before packet with bytes 1000-1999, TCP holds onto the later packet and waits for the earlier one. It delivers data to your application in order.

  2. Duplicates: If the network accidentally duplicates a packet, TCP sees "I already have bytes 1000-1999" and discards the duplicate.

TCP Connections and the Three-Way Handshake

Unlike IP (which just fires packets into the void), TCP establishes a connection before sending data. Both sides agree: "we're now in a conversation."

This happens through the three-way handshake:

Client                              Server
   │                                   │
   │──── SYN ─────────────────────────>│
   │     "I want to connect.           │
   │      My starting sequence         │
   │      number is 1000."             │
   │                                   │
   │<─────────────────────── SYN-ACK ──│
   │     "OK, I accept.                │
   │      My starting sequence         │
   │      number is 5000.              │
   │      I acknowledge your 1000."    │
   │                                   │
   │──── ACK ─────────────────────────>│
   │     "Great, I acknowledge         │
   │      your 5000."                  │
   │                                   │
   │     === Connection Established === │
   │                                   │

Why three messages?

  1. SYN (synchronize): Client says "I want to connect" and shares its initial sequence number
  2. SYN-ACK: Server says "I accept" and shares its sequence number, while acknowledging the client's
  3. ACK: Client confirms it received the server's info

After this, both sides know the connection is established and have agreed on starting sequence numbers.

Important: A "connection" is a logical concept, not physical. There's no dedicated wire between you and Google. A connection is just both computers agreeing "we're having a conversation" and keeping track of state (sequence numbers, what's been acknowledged, etc.).

Ports: Multiple Connections Per Machine

An IP address identifies a machine. But a machine runs many programs: a web server, a database, an SSH daemon. How does the machine know which program should receive which packets?

Ports. A port is a 16-bit number (0-65535) that identifies a specific application or service.

A full network address is IP:port, like 142.250.80.46:443.

Well-known ports (you'll see these constantly):

Port Service Description
22 SSH Secure shell (remote login)
80 HTTP Web traffic (unencrypted)
443 HTTPS Web traffic (encrypted)
5432 PostgreSQL Postgres database
6379 Redis Redis cache/database
27017 MongoDB MongoDB database

When you visit https://google.com, your browser connects to Google's IP on port 443.

But what about your side? Your computer also needs a port for the connection. The operating system assigns a random high port (like 54321) for outgoing connections. So a full connection looks like:

Your machine (192.168.1.100:54321) <═══> Google (142.250.80.46:443)

This is called a socket: the combination of IP address + port on each side uniquely identifies a connection.

What TCP Still Can't Solve

TCP makes communication reliable, but it can't work miracles:

  1. If the other machine crashes, TCP can't magically resurrect it. Eventually TCP will timeout and report an error.

  2. If the network is partitioned (all routes between two machines are broken), no amount of retrying will help.

  3. TCP doesn't guarantee speed. It guarantees delivery, but if the network is slow, your data is slow.

  4. Connections can be killed. If a connection is idle for too long, intermediate routers or firewalls might close it.


A.4: UDP: When You Don't Want TCP's Guarantees

Sometimes Reliability Isn't Worth It

TCP's reliability comes at a cost:

For some applications, these tradeoffs don't make sense:

UDP: User Datagram Protocol

UDP is another protocol built on IP, but much simpler than TCP.

UDP provides:

UDP does NOT provide:

Mental model:

When to Use Each

Use TCP when... Use UDP when...
Data must be complete and correct Missing data is acceptable
Order matters Latest data is more important than complete data
You're building request/response patterns You need lowest possible latency
Examples: HTTP, databases, file transfer Examples: video streaming, games, DNS

Most distributed systems you'll build use TCP. But understanding UDP helps you appreciate what TCP does for you.


A.5: DNS: Translating Names to Addresses

The Problem

Humans like names: google.com, github.com, api.mycompany.io

Computers need numbers: 142.250.80.46

DNS (Domain Name System) translates names to IP addresses. It's like the internet's phone book.

How DNS Works

When you type github.com in your browser:

1. Browser: "What's the IP for github.com?"
        │
        ▼
2. Check local cache on your machine
   Found and not expired? → Use it
   Not found? → Ask DNS resolver
        │
        ▼
3. DNS Resolver (your ISP, or 8.8.8.8 for Google, 1.1.1.1 for Cloudflare)
   Has it cached? → Return it
   Not cached? → Ask authoritative servers
        │
        ▼
4. Ask Root servers: "Who handles .com?"
        │
        ▼
5. Ask .com TLD servers: "Who handles github.com?"
        │
        ▼
6. Ask GitHub's nameservers: "What's the IP for github.com?"
        │
        ▼
7. Answer: "github.com = 140.82.112.4"
        │
        ▼
8. Response flows back up, cached at each level
        │
        ▼
9. Browser connects to 140.82.112.4

DNS Is Itself a Distributed System!

Here's something beautiful: DNS is one of the oldest and most successful distributed systems in existence.

Fun fact: When DNS has problems, the internet appears broken even though all the underlying servers are fine. You can't reach google.com by name, even though 142.250.80.46 is perfectly reachable.

What Can Go Wrong with DNS

  1. DNS server failure: That's why there are many servers with replication
  2. Slow lookups: Usually cached, but first lookup for a domain takes time
  3. DNS spoofing/poisoning: Attacker gives you wrong IP address
  4. Propagation delays: DNS changes can take hours to spread globally
  5. TTL expiration: Cached entries expire; then you need a fresh lookup

A.6: Sockets: How Your Code Talks to the Network

Now we get to code. Everything above is handled by the operating system. Your code interacts with the network through sockets.

What Is a Socket?

A socket is an endpoint for communication. It's the interface between your application and the TCP/IP stack in the operating system.

Think of it like this:

The Server/Client Pattern

Most network programs follow a pattern:

Server:

  1. Create a socket
  2. Bind it to an address (IP + port): "I will listen on port 8080"
  3. Listen for incoming connections
  4. Accept a connection when a client connects
  5. Read/write data
  6. Close the connection

Client:

  1. Create a socket
  2. Connect to the server's address
  3. Read/write data
  4. Close the connection

Let's implement both in Rust, explaining every line.

A TCP Server in Rust, Explained Line by Line

// === Imports ===

// std::net contains networking primitives
// TcpListener: for accepting incoming connections (server-side)
// TcpStream: a single TCP connection (used by both client and server)
use std::net::{TcpListener, TcpStream};

// std::io contains input/output traits and types
// Read: trait that provides the `read` method
// Write: trait that provides `write` and `write_all` methods
use std::io::{Read, Write};

// === Helper Function to Handle One Connection ===

// This function takes ownership of a TcpStream (one client connection)
// and returns a Result (Ok if everything worked, Err if something failed)
fn handle_client(mut stream: TcpStream) -> std::io::Result<()> {
    // peer_addr() returns the remote (client's) address
    // The ? operator: if this returns an Err, immediately return that Err
    // If it returns Ok(addr), unwrap the addr and continue
    let peer_addr = stream.peer_addr()?;
    println!("Handling connection from {}", peer_addr);
    
    // Create a buffer to store incoming data
    // [0u8; 1024] means: an array of 1024 bytes, all initialized to 0
    // We need `mut` because `read` will modify this buffer
    let mut buffer = [0u8; 1024];
    
    // Read data from the connection into our buffer
    // read() returns how many bytes were actually read
    // This BLOCKS until data arrives or the connection closes
    let bytes_read = stream.read(&mut buffer)?;
    
    // What if the client closed the connection without sending anything?
    if bytes_read == 0 {
        println!("Client closed connection without sending data");
        return Ok(());
    }
    
    // Convert the bytes we received to a string for display
    // &buffer[..bytes_read] is a slice of just the bytes that were filled in
    // from_utf8_lossy: if there are invalid UTF-8 bytes, replace them with �
    // (We use this because network data might not be valid UTF-8)
    let received = String::from_utf8_lossy(&buffer[..bytes_read]);
    println!("Received {} bytes: {}", bytes_read, received);
    
    // Create a response
    let response = format!("Echo: {}", received);
    
    // Send the response back
    // write_all() sends ALL the bytes, retrying if necessary
    // (plain write() might only send some bytes if the buffer is full)
    // as_bytes() converts the String to &[u8] (a byte slice)
    stream.write_all(response.as_bytes())?;
    
    println!("Sent response to {}", peer_addr);
    
    // Connection is automatically closed when `stream` goes out of scope
    // (Rust's Drop trait handles this)
    Ok(())
}

// === Main Function ===

fn main() -> std::io::Result<()> {
    // === Step 1: Create and bind a listener socket ===
    
    // TcpListener::bind() does several things:
    // 1. Creates a socket
    // 2. Binds it to the specified address (IP:port)
    // 3. Puts it in listening mode
    //
    // "127.0.0.1:8888" means:
    // - 127.0.0.1 = localhost (only accept connections from this machine)
    // - 8888 = the port number
    //
    // If you used "0.0.0.0:8888", it would accept connections from any IP
    // (useful for servers that should be reachable from other machines)
    let listener = TcpListener::bind("127.0.0.1:8888")?;
    
    println!("Server listening on 127.0.0.1:8888");
    println!("Connect with: nc localhost 8888");
    println!("Or run the client program");
    println!();
    
    // === Step 2: Accept connections in a loop ===
    
    // listener.incoming() returns an iterator of connection attempts
    // Each item is a Result<TcpStream, Error>
    for stream_result in listener.incoming() {
        // Handle the Result - did the connection attempt succeed?
        match stream_result {
            Ok(stream) => {
                // We got a connection!
                // Note: This is a simple single-threaded server.
                // While we're handling this client, other clients wait.
                // A real server would spawn a thread or use async.
                if let Err(e) = handle_client(stream) {
                    // Log the error but keep serving other clients
                    eprintln!("Error handling client: {}", e);
                }
            }
            Err(e) => {
                // The connection attempt itself failed
                // (Different from errors while handling a connection)
                eprintln!("Failed to accept connection: {}", e);
            }
        }
    }
    
    Ok(())
}

Key Concepts in the Server

Binding: When we call bind("127.0.0.1:8888"), we're telling the OS: "I want to receive TCP connections sent to this IP address on port 8888." If another program is already using that port, binding fails.

Listening: After binding, the socket is in listening mode, ready to receive connections.

Accepting: incoming() blocks until a client connects, then returns a TcpStream representing that connection. The original listener keeps listening for more connections.

Blocking I/O: When we call stream.read(), our thread stops and waits until data arrives. This is called "blocking." It's simple but means we can only handle one client at a time (in this code). Real servers use threads or async to handle many clients concurrently.

A TCP Client in Rust, Explained Line by Line

use std::net::TcpStream;
use std::io::{Read, Write};

fn main() -> std::io::Result<()> {
    // === Step 1: Connect to the server ===
    
    // TcpStream::connect() does the three-way handshake:
    // 1. Sends SYN to server
    // 2. Receives SYN-ACK from server
    // 3. Sends ACK to server
    //
    // If successful, returns a TcpStream representing the connection.
    // If the server isn't running, or the port is wrong, this fails.
    //
    // Common errors:
    // - ConnectionRefused: server isn't listening on that port
    // - TimedOut: server didn't respond (maybe wrong IP, or firewall)
    
    println!("Connecting to server...");
    let mut stream = TcpStream::connect("127.0.0.1:8888")?;
    
    // local_addr() returns OUR side of the connection
    // peer_addr() returns the SERVER's side
    println!("Connected!");
    println!("  Local address: {}", stream.local_addr()?);
    println!("  Server address: {}", stream.peer_addr()?);
    
    // === Step 2: Send data to the server ===
    
    let message = "Hello from Rust client!";
    println!("\nSending: {}", message);
    
    // write_all() ensures all bytes are sent
    stream.write_all(message.as_bytes())?;
    
    // === Step 3: Receive response from server ===
    
    let mut buffer = [0u8; 1024];
    let bytes_read = stream.read(&mut buffer)?;
    
    if bytes_read == 0 {
        println!("Server closed connection without responding");
    } else {
        let response = String::from_utf8_lossy(&buffer[..bytes_read]);
        println!("Received: {}", response);
    }
    
    // Connection is closed when `stream` goes out of scope
    println!("\nConnection closed.");
    
    Ok(())
}

Running the Example

Create a new Rust project:

cargo new tcp_demo
cd tcp_demo

Replace src/main.rs with the server code. Create src/bin/client.rs with the client code.

Update Cargo.toml:

[package]
name = "tcp_demo"
version = "0.1.0"
edition = "2021"

[[bin]]
name = "server"
path = "src/main.rs"

[[bin]]
name = "client"
path = "src/bin/client.rs"

Run the server:

cargo run --bin server

In another terminal, run the client:

cargo run --bin client

You should see the message sent and echoed back!


A.7: Timeouts and Error Handling

The Fundamental Problem with Networks

When you send a request over a network and don't get a response, what happened?

  1. The request never arrived: Network problem on the way there
  2. The request arrived but the server crashed before processing: Server died
  3. The server processed it but crashed before responding: Work was done but you don't know!
  4. The response was sent but got lost: Network problem on the way back
  5. Everything is fine, just slow: Network congestion, server overloaded

Here's the crucial point: From the client's perspective, all of these look the same. You sent a request. You're waiting. No response yet. Is the server dead? Is it slow? Did your message get lost? You can't tell.

This is the core challenge of distributed systems.

Timeouts: Giving Up After a While

A timeout says: "If I don't hear back within X seconds, assume something went wrong and give up."

use std::net::TcpStream;
use std::time::Duration;
use std::io::{Read, Write};

fn main() -> std::io::Result<()> {
    // Connect with a timeout
    // If the server doesn't respond to our SYN within 5 seconds, give up
    let address = "127.0.0.1:8888".parse().unwrap();
    
    println!("Attempting to connect (5 second timeout)...");
    
    let mut stream = TcpStream::connect_timeout(
        &address,
        Duration::from_secs(5)
    )?;
    
    // Set timeouts for reading and writing
    // These apply to EVERY read/write operation on this stream
    stream.set_read_timeout(Some(Duration::from_secs(5)))?;
    stream.set_write_timeout(Some(Duration::from_secs(5)))?;
    
    println!("Connected! Sending request...");
    stream.write_all(b"Hello?")?;
    
    // Now read the response
    // If no data arrives within 5 seconds, read() will return an error
    let mut buffer = [0u8; 1024];
    
    match stream.read(&mut buffer) {
        Ok(0) => {
            println!("Server closed connection without responding");
        }
        Ok(n) => {
            println!("Received: {}", String::from_utf8_lossy(&buffer[..n]));
        }
        Err(e) => {
            // Check what kind of error
            if e.kind() == std::io::ErrorKind::WouldBlock 
               || e.kind() == std::io::ErrorKind::TimedOut {
                println!("TIMEOUT: No response within 5 seconds");
            } else {
                println!("Error: {}", e);
            }
        }
    }
    
    Ok(())
}

The Timeout Dilemma

How long should the timeout be?

Too short:

Too long:

There's no perfect answer. It depends on:

Typical timeouts:

Types of Network Errors

Let's understand the different errors you'll encounter:

use std::net::TcpStream;
use std::io::{self, Read, Write};
use std::time::Duration;

fn demonstrate_errors() {
    // === Connection Refused ===
    // Server exists but nothing is listening on that port
    // You'll see this if you run the client without starting the server
    match TcpStream::connect("127.0.0.1:9999") {
        Err(e) if e.kind() == io::ErrorKind::ConnectionRefused => {
            println!("Connection refused: nothing listening on that port");
        }
        _ => {}
    }
    
    // === Connection Timeout ===
    // Can't reach the server at all (wrong IP, firewall, server down)
    // Using a non-routable IP to demonstrate
    let addr = "10.255.255.1:80".parse().unwrap();
    match TcpStream::connect_timeout(&addr, Duration::from_secs(2)) {
        Err(e) if e.kind() == io::ErrorKind::TimedOut => {
            println!("Connection timeout: couldn't reach server");
        }
        _ => {}
    }
    
    // === Read Timeout ===
    // Connected, but server isn't sending data
    // (Need a server that accepts but doesn't respond to demonstrate)
    
    // === Connection Reset ===
    // Server abruptly closed the connection
    // Usually means the server crashed or explicitly killed the connection
}

Common Error Kinds

ErrorKind What it means Common causes
ConnectionRefused Server rejected connection Nothing listening on port, firewall
TimedOut Operation took too long Server overloaded, network problem
ConnectionReset Connection forcibly closed Server crashed, server rejected request
BrokenPipe Writing to a closed connection Server closed while you were sending
NotConnected Socket isn't connected Forgot to connect, or already disconnected
WouldBlock Non-blocking operation can't complete Used with non-blocking I/O

Retries with Exponential Backoff

When something fails, a natural instinct is to try again. But naive retries can make problems worse.

Scenario: Server is overloaded, responding slowly. 100 clients timeout and all immediately retry. Now the server has 200 requests to handle. More timeouts. More retries. Server dies.

Solution: Exponential backoff. Wait longer between each retry.

use std::net::TcpStream;
use std::io::{Read, Write};
use std::time::Duration;
use std::thread;

/// Attempts an operation with retries and exponential backoff.
/// 
/// # Arguments
/// * `max_retries` - How many times to try before giving up
/// * `operation` - A closure that attempts the operation
/// 
/// # Returns
/// The result of the first successful attempt, or the last error
fn with_retry<T, E, F>(max_retries: u32, mut operation: F) -> Result<T, E>
where
    F: FnMut() -> Result<T, E>,
    E: std::fmt::Display,
{
    let mut last_error = None;
    
    for attempt in 0..max_retries {
        if attempt > 0 {
            // Exponential backoff: 1s, 2s, 4s, 8s, ...
            // `1 << attempt` is 2^attempt (bit shifting)
            let wait_secs = 1 << (attempt - 1);
            println!("  Waiting {} seconds before retry...", wait_secs);
            thread::sleep(Duration::from_secs(wait_secs));
        }
        
        println!("Attempt {} of {}...", attempt + 1, max_retries);
        
        match operation() {
            Ok(result) => {
                println!("  Success!");
                return Ok(result);
            }
            Err(e) => {
                println!("  Failed: {}", e);
                last_error = Some(e);
            }
        }
    }
    
    Err(last_error.expect("Should have at least one error"))
}

fn make_request() -> std::io::Result<String> {
    let mut stream = TcpStream::connect_timeout(
        &"127.0.0.1:8888".parse().unwrap(),
        Duration::from_secs(2),
    )?;
    
    stream.set_read_timeout(Some(Duration::from_secs(2)))?;
    stream.write_all(b"Hello")?;
    
    let mut buffer = [0u8; 1024];
    let n = stream.read(&mut buffer)?;
    
    Ok(String::from_utf8_lossy(&buffer[..n]).to_string())
}

fn main() {
    match with_retry(4, make_request) {
        Ok(response) => println!("\nFinal result: {}", response),
        Err(e) => println!("\nAll retries failed: {}", e),
    }
}

Idempotency: A Critical Concept

Question: If a request times out, should you retry?

It depends on whether the operation is idempotent.

Idempotent means: doing it multiple times has the same effect as doing it once.

Operation Idempotent? Safe to retry?
GET a webpage Yes
Set user's email to "alice@example.com" Yes
Delete record with ID 123 Yes ✓ (if not exists, that's fine)
Add $100 to account NO ✗ (might add $200!)
Create new order NO ✗ (might create duplicate!)
Increment counter NO

Why this matters: Remember, you don't know if your request succeeded. Maybe the server processed it and the response got lost. If you retry a non-idempotent operation, you might do it twice!

Making operations idempotent:


A.8: HTTP: The Protocol of the Web

What is HTTP?

HTTP (HyperText Transfer Protocol) is an application-layer protocol built on TCP. It's how web browsers talk to web servers, how APIs communicate, and probably the most common protocol you'll work with.

The Request-Response Pattern

HTTP follows a simple pattern:

  1. Client sends a request (method, path, headers, optional body)
  2. Server sends a response (status code, headers, optional body)
Client                              Server
   │                                   │
   │──── HTTP Request ─────────────────>
   │     GET /users/42 HTTP/1.1        │
   │     Host: api.example.com         │
   │     Accept: application/json      │
   │                                   │
   │<───────────────── HTTP Response ──│
   │     HTTP/1.1 200 OK               │
   │     Content-Type: application/json│
   │     {"id": 42, "name": "Alice"}   │
   │                                   │

Anatomy of an HTTP Request

GET /search?q=rust HTTP/1.1       ← Request line: METHOD PATH VERSION
Host: www.google.com               ← Headers start
User-Agent: Mozilla/5.0
Accept: text/html
Accept-Language: en-US
                                   ← Empty line marks end of headers
                                   ← Body would go here (for POST/PUT)

Request line components:

Common HTTP methods:

Method Meaning Has body? Idempotent?
GET Retrieve data No Yes
POST Create new data Yes No
PUT Replace data entirely Yes Yes
PATCH Update data partially Yes Not always
DELETE Remove data No Yes
HEAD Like GET but no body No Yes

Anatomy of an HTTP Response

HTTP/1.1 200 OK                    ← Status line: VERSION CODE MESSAGE
Content-Type: application/json     ← Headers start
Content-Length: 42
Date: Mon, 01 Jan 2024 12:00:00 GMT
                                   ← Empty line marks end of headers
{"id": 42, "name": "Alice"}        ← Body

Common status codes:

Code Meaning When you'll see it
200 OK Request succeeded
201 Created POST succeeded, resource created
204 No Content Success, but no body to return
301 Moved Permanently Resource has a new URL (redirect)
302 Found Temporary redirect
400 Bad Request Your request was malformed
401 Unauthorized Need to authenticate
403 Forbidden Authenticated but not allowed
404 Not Found Resource doesn't exist
429 Too Many Requests Rate limited
500 Internal Server Error Server crashed
502 Bad Gateway Proxy/load balancer can't reach backend
503 Service Unavailable Server overloaded or down
504 Gateway Timeout Proxy/load balancer timeout

Making HTTP Requests in Rust

For HTTP in Rust, we'll use the reqwest crate. It handles all the complexity (TLS, connection pooling, etc.).

# Cargo.toml
[dependencies]
reqwest = { version = "0.11", features = ["blocking", "json"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
use serde::Deserialize;

// Define a struct matching the JSON we expect
// `Deserialize` is a serde trait that lets us parse JSON into this struct
#[derive(Debug, Deserialize)]
struct User {
    login: String,
    id: u64,
    name: Option<String>,  // Option because this field might be null
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // === Simple GET request ===
    
    println!("Fetching user from GitHub API...\n");
    
    // The `blocking` module provides synchronous (blocking) HTTP
    // There's also async reqwest for production use
    let response = reqwest::blocking::get(
        "https://api.github.com/users/octocat"
    )?;
    
    // Check the status
    println!("Status: {}", response.status());
    println!("Headers:");
    for (name, value) in response.headers() {
        println!("  {}: {:?}", name, value);
    }
    
    // Parse JSON body into our struct
    // The ::<User> is a "turbofish": tells Rust what type to deserialize into
    let user: User = response.json()?;
    println!("\nParsed user:");
    println!("  Login: {}", user.login);
    println!("  ID: {}", user.id);
    println!("  Name: {:?}", user.name);
    
    Ok(())
}

Building More Complex Requests

use reqwest::blocking::Client;
use reqwest::header::{AUTHORIZATION, CONTENT_TYPE, USER_AGENT};
use serde::{Deserialize, Serialize};
use std::time::Duration;

// For request body
#[derive(Serialize)]
struct CreateIssue {
    title: String,
    body: String,
}

// For response body
#[derive(Debug, Deserialize)]
struct Issue {
    id: u64,
    number: u64,
    title: String,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a reusable client with configuration
    // In production, you'd typically create ONE client and reuse it
    // (it maintains a connection pool)
    let client = Client::builder()
        .timeout(Duration::from_secs(10))  // Request timeout
        .user_agent("my-rust-app/1.0")      // Identify ourselves
        .build()?;
    
    // Build a POST request
    let issue = CreateIssue {
        title: "Bug report".to_string(),
        body: "Something is broken!".to_string(),
    };
    
    let response = client
        .post("https://api.github.com/repos/owner/repo/issues")
        .header(AUTHORIZATION, "Bearer your-token-here")
        .header(CONTENT_TYPE, "application/json")
        .json(&issue)  // Serialize `issue` to JSON and set as body
        .send()?;
    
    // Handle different status codes
    match response.status().as_u16() {
        201 => {
            let created: Issue = response.json()?;
            println!("Created issue #{}: {}", created.number, created.title);
        }
        401 => {
            println!("Authentication failed - check your token");
        }
        403 => {
            println!("Forbidden - you don't have permission");
        }
        404 => {
            println!("Repository not found");
        }
        422 => {
            // Unprocessable Entity - validation error
            let error: serde_json::Value = response.json()?;
            println!("Validation error: {}", error);
        }
        code => {
            println!("Unexpected status: {}", code);
            println!("Body: {}", response.text()?);
        }
    }
    
    Ok(())
}

PART B: DISTRIBUTED SYSTEMS CONCEPTS

The Core Theory

This is the main content. Part A was background; this is what you're here to learn.


B.1: Why Distributed Systems Are Hard (The Eight Fallacies)

Now that you understand how networks actually work, let's revisit why distributed systems are challenging.

In 1994, Peter Deutsch identified eight assumptions that developers new to distributed systems incorrectly make. These are the Eight Fallacies of Distributed Computing.

Fallacy 1: The Network is Reliable

The assumption: When I send a message, it will arrive.

Reality: You now understand why this is false. Packets can be:

Even with TCP, which handles retransmission, connections can fail entirely if the network is partitioned.

What this means for your code:

Fallacy 2: Latency is Zero

The assumption: Network communication is instant.

Reality: The speed of light is a real constraint.

Distance One-way latency (minimum, speed of light)
Same rack 0.0005 ms
Same datacenter 0.5 ms
NYC to London 28 ms
NYC to Tokyo 50 ms

And that's just physics. Real latency includes:

What this means for your code:

Fallacy 3: Bandwidth is Infinite

The assumption: You can send as much data as you want.

Reality: Links have limited capacity, and you share them with everyone else.

What this means:

Fallacy 4: The Network is Secure

The assumption: Data sent over the network is safe.

Reality: Your packets pass through many routers you don't control. Without encryption, anyone can read or modify them.

What this means:

Fallacy 5: Topology Doesn't Change

The assumption: The network structure stays the same.

Reality: Servers come and go. Routes change. Cloud instances are ephemeral.

What this means:

Fallacy 6: There is One Administrator

The assumption: One person or team controls the entire system.

Reality: You depend on your ISP, cloud provider, third-party APIs, DNS providers, CDNs...

What this means:

Fallacy 7: Transport Cost is Zero

The assumption: Sending data is free.

Reality: Network calls cost:

Example: AWS charges $0.09/GB for data leaving their network. A service sending 100TB/month pays $9,000 just for egress.

Fallacy 8: The Network is Homogeneous

The assumption: All parts of the network are the same.

Reality: Different links have different speeds, reliability, and characteristics. The last mile to a mobile phone is very different from a datacenter interconnect.


A.9: Concurrency: Hard Even on One Machine

Before we get deeper into distributed systems, we need to understand concurrency. This matters because:

  1. Concurrency bugs are hard to find on one machine
  2. In distributed systems, they're even harder
  3. You can't use simple locks across machines

Why Concurrency Matters

A web server needs to handle thousands of connections. If each request takes 100ms, and you handle them one at a time, you can only do 10 requests/second. That's terrible.

You need to handle multiple requests concurrently.

Processes vs. Threads

Process: A running program with its own isolated memory space. Processes can't accidentally corrupt each other's memory.

Thread: A lightweight unit of execution within a process. All threads in a process share the same memory space.

┌────────────────────────────────────────────────────────┐
│                      Process                            │
│  ┌──────────────────────────────────────────────────┐  │
│  │              Shared Memory Space                  │  │
│  │                                                   │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌──────────┐  │  │
│  │  │  Thread 1   │  │  Thread 2   │  │ Thread 3 │  │  │
│  │  │             │  │             │  │          │  │  │
│  │  │ Can access  │  │ Can access  │  │Can access│  │  │
│  │  │ any variable│  │ any variable│  │   any    │  │  │
│  │  │ in shared   │  │ in shared   │  │ variable │  │  │
│  │  │ memory!     │  │ memory!     │  │          │  │  │
│  │  └─────────────┘  └─────────────┘  └──────────┘  │  │
│  │                                                   │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘

Sharing memory is powerful but dangerous.

Race Conditions

A race condition occurs when the result of a program depends on the timing of uncontrollable events (like thread scheduling).

Let's see a classic example. This code is intentionally broken to demonstrate the problem:

use std::thread;

// This is DELIBERATELY BROKEN CODE to demonstrate race conditions
// DON'T DO THIS - Rust actually prevents this at compile time!

fn main() {
    // Imagine we could share this between threads (we can't without synchronization)
    let mut counter = 0;
    
    // Thread 1 wants to increment counter
    // Internally this is: read counter → add 1 → write counter
    
    // Thread 2 also wants to increment counter
    // Same steps: read counter → add 1 → write counter
    
    // If both threads run truly concurrently:
    //
    //   Thread 1              Thread 2
    //   --------              --------
    //   read counter (0)
    //                         read counter (0)
    //   add 1 (result: 1)
    //                         add 1 (result: 1)
    //   write counter (1)
    //                         write counter (1)
    //
    // Final value: 1 (should be 2!)
    // This is a race condition.
}

Both threads read 0, both add 1, both write 1. One increment is lost!

Rust's protection: Rust won't let you write this code. The compiler enforces that mutable data is either:

Mutex: Mutual Exclusion

A Mutex (mutual exclusion) is a lock that ensures only one thread can access data at a time.

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    // === Breaking down the types ===
    //
    // Mutex<i32>: A mutex protecting an i32
    //   - Can only access the i32 by "locking" the mutex
    //   - Lock is released automatically when it goes out of scope
    //
    // Arc<T>: "Atomically Reference Counted" smart pointer
    //   - Allows multiple ownership of T
    //   - When all Arcs are dropped, T is dropped
    //   - "Atomic" means the reference counting is thread-safe
    //
    // Why Arc<Mutex<T>>?
    //   - Mutex: provides safe mutable access from multiple threads
    //   - Arc: allows multiple threads to own the mutex
    //   - Together: multiple threads can safely share and modify data
    
    let counter = Arc::new(Mutex::new(0));
    
    let mut handles = vec![];
    
    for i in 0..10 {
        // Clone the Arc to get a new reference to the same Mutex
        // This DOESN'T clone the underlying data, just increments the ref count
        let counter_clone = Arc::clone(&counter);
        
        // spawn() requires the closure to have 'static lifetime
        // Using `move` transfers ownership of `counter_clone` into the thread
        let handle = thread::spawn(move || {
            // === Locking the mutex ===
            //
            // lock() does two things:
            // 1. Blocks until we acquire the lock (waits if another thread holds it)
            // 2. Returns a MutexGuard<i32> that lets us access the data
            //
            // unwrap() is needed because lock() returns a Result
            // It can fail if a thread panicked while holding the lock ("poisoned")
            let mut num = counter_clone.lock().unwrap();
            
            // `num` is a MutexGuard<i32>, which implements DerefMut
            // So `*num` gives us &mut i32
            *num += 1;
            
            println!("Thread {} incremented counter to {}", i, *num);
            
            // When `num` goes out of scope here, the MutexGuard is dropped
            // This automatically releases the lock
        });
        
        handles.push(handle);
    }
    
    // Wait for all threads to complete
    for handle in handles {
        handle.join().unwrap();
    }
    
    // Get the final value
    // Again, we lock to access the data
    println!("\nFinal counter value: {}", *counter.lock().unwrap());
}

Breaking Down the Types

Let's make sure the types are crystal clear:

// Just the value - no sharing possible
let x: i32 = 0;

// Mutex around the value - one thread can access at a time
// But we can't share this Mutex across threads!
let m: Mutex<i32> = Mutex::new(0);

// Rc (Reference Counted) - multiple ownership, but NOT thread-safe
// let r: Rc<Mutex<i32>> = Rc::new(Mutex::new(0));  // Can't send to threads!

// Arc (Atomic Reference Counted) - multiple ownership, IS thread-safe
let a: Arc<Mutex<i32>> = Arc::new(Mutex::new(0));
// NOW we can clone `a` and send clones to different threads

Deadlock

Deadlock occurs when two or more threads are each waiting for the other to release a resource.

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let lock_a = Arc::new(Mutex::new(0));
    let lock_b = Arc::new(Mutex::new(0));
    
    let lock_a_clone = Arc::clone(&lock_a);
    let lock_b_clone = Arc::clone(&lock_b);
    
    // Thread 1: Locks A, then tries to lock B
    let handle1 = thread::spawn(move || {
        let _a = lock_a.lock().unwrap();
        println!("Thread 1 has lock A, trying to get lock B...");
        thread::sleep(std::time::Duration::from_millis(100));  // Give thread 2 time
        let _b = lock_b.lock().unwrap();  // WAITS FOREVER - thread 2 has B
        println!("Thread 1 has both locks");
    });
    
    // Thread 2: Locks B, then tries to lock A
    let handle2 = thread::spawn(move || {
        let _b = lock_b_clone.lock().unwrap();
        println!("Thread 2 has lock B, trying to get lock A...");
        thread::sleep(std::time::Duration::from_millis(100));
        let _a = lock_a_clone.lock().unwrap();  // WAITS FOREVER - thread 1 has A
        println!("Thread 2 has both locks");
    });
    
    // This program will hang forever!
    // Thread 1: Holds A, waiting for B
    // Thread 2: Holds B, waiting for A
    
    handle1.join().unwrap();
    handle2.join().unwrap();
}

Avoiding deadlock:

  1. Always acquire locks in the same order
  2. Use timeout-based lock acquisition
  3. Design to avoid needing multiple locks
  4. Use lock-free data structures when possible

Why This Matters for Distributed Systems

In a distributed system, you can't use Mutex across machines. Each server has its own memory. There's no shared Arc<Mutex<T>>.

But the same problems exist! Imagine two database servers:

Server A reads balance: 100
Server B reads balance: 100
Server A writes balance: 50
Server B writes balance: 70  ← Server A's write is lost!

This is the same race condition, but across machines. Solving it requires distributed coordination mechanisms: consensus algorithms, distributed transactions, etc. You'll learn these in later phases.


A.10: Serialization: Turning Data into Bytes

When machines communicate, they need to agree on how to represent data as bytes.

The Problem

Your Rust program has types:

struct User {
    name: String,
    age: u32,
    friends: Vec<String>,
}

The network only carries bytes. How do you send this struct to another machine (which might be running Python or Go)?

Serialization and Deserialization

Serialization: Converting data structures to bytes (also called encoding, marshaling)
Deserialization: Converting bytes back to data structures (also called decoding, unmarshaling)

Rust struct                          Bytes                           (Maybe) Rust struct
User { name: "Alice", ... }  →  [123, 34, 110, 97, ...]  →  User { name: "Alice", ... }
         serialize                                                 deserialize

Serde: Rust's Serialization Framework

Serde is the de facto standard for serialization in Rust. It's a framework that lets you support many formats (JSON, YAML, TOML, MessagePack, etc.) with the same code.

# Cargo.toml
[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"  # JSON format
# bincode = "1.3"   # Binary format (faster, smaller)
# toml = "0.8"      # TOML format

Basic Usage

use serde::{Serialize, Deserialize};

// === The derive macros ===
//
// #[derive(Serialize, Deserialize)] automatically implements
// the Serialize and Deserialize traits for your struct.
//
// This generates code that knows how to convert your struct
// to/from various formats (JSON, binary, etc.)
#[derive(Debug, Serialize, Deserialize)]
struct User {
    name: String,
    age: u32,
    email: Option<String>,  // Optional fields become null in JSON
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let user = User {
        name: "Alice".to_string(),
        age: 30,
        email: Some("alice@example.com".to_string()),
    };
    
    // === Serialize to JSON string ===
    let json_string: String = serde_json::to_string(&user)?;
    println!("JSON string: {}", json_string);
    // Output: {"name":"Alice","age":30,"email":"alice@example.com"}
    
    // === Serialize to "pretty" JSON (with indentation) ===
    let pretty_json: String = serde_json::to_string_pretty(&user)?;
    println!("\nPretty JSON:\n{}", pretty_json);
    
    // === Serialize to bytes (Vec<u8>) ===
    let json_bytes: Vec<u8> = serde_json::to_vec(&user)?;
    println!("\nBytes: {:?}", &json_bytes[..20]);  // First 20 bytes
    
    // === Deserialize from JSON string ===
    let json_str = r#"{"name":"Bob","age":25,"email":null}"#;
    let user2: User = serde_json::from_str(json_str)?;
    println!("\nDeserialized: {:?}", user2);
    
    // === Deserialize from bytes ===
    let user3: User = serde_json::from_slice(&json_bytes)?;
    println!("From bytes: {:?}", user3);
    
    Ok(())
}

Serde Attributes for Customization

use serde::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize)]
struct Message {
    // Rename field in JSON
    #[serde(rename = "messageId")]
    id: u64,
    
    // Use different name for serializing vs deserializing
    #[serde(rename(serialize = "msg", deserialize = "message"))]
    content: String,
    
    // Skip this field entirely
    #[serde(skip)]
    internal_state: u32,
    
    // Use default value if missing during deserialization
    #[serde(default)]
    retries: u32,
    
    // Custom default value
    #[serde(default = "default_priority")]
    priority: u32,
}

fn default_priority() -> u32 {
    5
}

Binary vs Text Formats

JSON (serde_json):

Bincode:

// With bincode (add to Cargo.toml: bincode = "1.3")

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize, Debug)]
struct Data {
    values: Vec<u64>,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = Data {
        values: (0..1000).collect(),
    };
    
    // JSON serialization
    let json = serde_json::to_vec(&data)?;
    println!("JSON size: {} bytes", json.len());
    
    // Bincode serialization
    let binary = bincode::serialize(&data)?;
    println!("Bincode size: {} bytes", binary.len());
    
    // Bincode is typically much smaller for numeric data
    // JSON:    ~6,893 bytes (each number is ASCII digits)
    // Bincode: ~8,008 bytes (8 bytes per u64 + vec length)
    
    Ok(())
}

Why Serialization Matters for Distributed Systems

  1. All network communication involves serialization. Every RPC call, every message to a queue, every HTTP request serializes data.

  2. Format choice affects performance. In high-throughput systems, serialization can be a bottleneck.

  3. Schema evolution is tricky. What happens when you add a field? Remove one? Old code needs to read new data and vice versa.

  4. Cross-language compatibility. If you're talking to a Python service, you need a format both understand. JSON and Protocol Buffers are common choices.


B.2: Latency vs. Throughput

Two critical metrics for distributed systems.

Definitions

Latency: How long it takes to complete one operation.

Throughput: How many operations you complete in a given time.

The Highway Analogy

Imagine a highway between two cities, 100 miles apart. Cars travel at 50 mph.

Latency = Time for one car to travel from City A to City B = 2 hours

Throughput = Cars arriving at City B per hour

Now imagine we add more lanes:

We quadrupled throughput! But latency is still 2 hours. More lanes don't make individual cars go faster.

Key insight: You can often improve throughput without improving latency, and vice versa.

In Distributed Systems

Strategy Affects latency? Affects throughput?
Put server closer to user ✓ Reduces Minimal
Add more servers Minimal ✓ Increases
Add caching ✓ Reduces (cache hits) ✓ Increases
Batch operations Usually increases ✓ Increases
Use faster hardware ✓ Reduces ✓ Increases

Latency-Throughput Tradeoffs

Sometimes you must choose:

Batching: Instead of processing requests one at a time, wait until you have 100 and process them together.

Replication: Write data to 3 servers before acknowledging.

Measuring Latency: Percentiles

Don't use averages. They hide problems.

Example: 99 requests take 10ms. 1 request takes 10,000ms.

Use percentiles instead:

Why high percentiles matter:

If a user's page load requires 10 backend calls, and each has p99 = 1 second:

This is why Amazon famously targets p99.9 latency for critical services.


B.3: Time in Distributed Systems

The Problem with Clocks

On a single computer, time is simple. There's one clock, events happen in sequence.

In a distributed system, each computer has its own clock. These clocks:

  1. Drift: Even the best quartz clocks drift ~1 second per month. Cheap ones drift more.

  2. Jump: When a computer synchronizes with NTP (Network Time Protocol), its clock might jump forward or even backward.

  3. Disagree: Right now, two servers in your datacenter probably show slightly different times.

Why This Matters

Scenario: A distributed database. Two servers receive writes:

Server A (clock: 10:00:00.000): SET user.email = "alice@new.com"
Server B (clock: 10:00:00.001): SET user.email = "alice@old.com"

Which write wins? If we pick "latest timestamp wins," Server B's write wins. But what if Server B's clock is 5ms fast? Server A's write was actually later!

Result: User sets their email to "new.com", but the system stores "old.com". User is very confused.

Physical Time vs. Logical Time

Physical time: What the clock on the wall says.

Logical time: A way to order events based on causality, not wall clocks.

This is what Leslie Lamport figured out in 1978.


B.4: Lamport Clocks

The Insight

Leslie Lamport realized:

We don't need to know when events happened. We just need to know the order in which they could have affected each other.

If I send you a message, my sending definitely happened before your receiving. That's causality, and it doesn't require synchronized clocks.

The "Happened-Before" Relation

Lamport defined a partial ordering called "happened-before" (written →):

Event A happened-before Event B (A → B) if:

  1. Same process, sequential order: A and B are events in the same process, and A occurred before B.

  2. Message send/receive: A is the sending of a message, and B is the receiving of that same message.

  3. Transitivity: If A → B and B → C, then A → C.

If neither A → B nor B → A, the events are "concurrent" (written A || B). Neither could have influenced the other.

The Lamport Clock Algorithm

Each process maintains a logical clock: just a counter.

Rules:

  1. Before any event: Increment your clock.
  2. When sending a message: Include your current clock value.
  3. When receiving a message with timestamp t: Set your clock to max(your_clock, t) + 1.

Let's trace through an example:

Process P1 (clock=0)              Process P2 (clock=0)
        │                                 │
        │ [Internal event]                │
        │ clock = 0 + 1 = 1               │
        │                                 │
        │ [Send message to P2]            │
        │ clock = 1 + 1 = 2               │
        │ msg contains timestamp 2        │
        │─────────────────────────────────>
        │                                 │
        │                           [Receive message with ts=2]
        │                           clock = max(0, 2) + 1 = 3
        │                                 │
        │                           [Internal event]
        │                           clock = 3 + 1 = 4
        │                                 │
        │                           [Send message to P1]
        │                           clock = 4 + 1 = 5
        │<─────────────────────────────────
        │                           msg contains timestamp 5
        │                                 │
[Receive message with ts=5]               │
clock = max(2, 5) + 1 = 6                 │
        │                                 │

The Guarantee

If A → B, then LC(A) < LC(B).

If A happened-before B, then A's Lamport clock is strictly less than B's.

This means: sorting events by Lamport clock gives you an ordering consistent with causality.

The Limitation

The converse is NOT true.

If LC(A) < LC(B), we CANNOT conclude A → B. The events might be concurrent.

Example:

P1: Event X with LC = 5
P2: Event Y with LC = 3

Does X happen-before Y? Or Y before X? Or are they concurrent?

We can't tell from Lamport clocks alone!

B.4: Lamport Clocks (continued)

Implementation in Rust

/// A Lamport clock for logical time ordering.
/// 
/// # Guarantees
/// If event A happened-before event B, then A's timestamp < B's timestamp.
/// 
/// # Non-guarantees
/// If A's timestamp < B's timestamp, we cannot conclude A happened-before B.
/// The events might be concurrent.
#[derive(Debug, Clone)]
pub struct LamportClock {
    // The current logical time
    // Using u64 gives us ~584 years at 1 billion events/second
    time: u64,
}

impl LamportClock {
    /// Create a new clock starting at 0.
    pub fn new() -> Self {
        LamportClock { time: 0 }
    }
    
    /// Increment the clock before any local event.
    /// Returns the new timestamp.
    /// 
    /// Call this before any local operation that you want to timestamp:
    /// - Writing to a database
    /// - Sending a message
    /// - Any state change
    pub fn tick(&mut self) -> u64 {
        self.time += 1;
        self.time
    }
    
    /// Get a timestamp to include in an outgoing message.
    /// This increments the clock (sending is an event).
    pub fn send_timestamp(&mut self) -> u64 {
        self.tick()
    }
    
    /// Update the clock when receiving a message.
    /// 
    /// This merges our local time with the sender's time,
    /// ensuring our clock is at least as high as the sender's.
    /// 
    /// # Arguments
    /// * `received_timestamp` - The timestamp from the received message
    /// 
    /// # Returns
    /// The new local timestamp after receiving
    pub fn receive(&mut self, received_timestamp: u64) -> u64 {
        // Take the max of our time and received time
        // Then increment (receiving is an event)
        self.time = self.time.max(received_timestamp) + 1;
        self.time
    }
    
    /// Get current time without incrementing.
    /// Useful for debugging/logging.
    pub fn current_time(&self) -> u64 {
        self.time
    }
}

// Let's verify the behavior
fn main() {
    println!("=== Lamport Clock Demo ===\n");
    
    let mut clock_p1 = LamportClock::new();
    let mut clock_p2 = LamportClock::new();
    
    // P1 does an internal event
    let t1 = clock_p1.tick();
    println!("P1 internal event: timestamp = {}", t1);  // 1
    
    // P1 sends a message
    let send_ts = clock_p1.send_timestamp();
    println!("P1 sends message with timestamp = {}", send_ts);  // 2
    
    // P2 receives the message
    let recv_ts = clock_p2.receive(send_ts);
    println!("P2 receives message, new timestamp = {}", recv_ts);  // 3
    
    // P2 does an internal event
    let t2 = clock_p2.tick();
    println!("P2 internal event: timestamp = {}", t2);  // 4
    
    // Meanwhile, P1 does another event (hasn't heard from P2 yet)
    let t3 = clock_p1.tick();
    println!("P1 internal event (no sync): timestamp = {}", t3);  // 3
    
    // P2 sends a message
    let send_ts_2 = clock_p2.send_timestamp();
    println!("P2 sends message with timestamp = {}", send_ts_2);  // 5
    
    // P1 receives it
    let recv_ts_2 = clock_p1.receive(send_ts_2);
    println!("P1 receives message, new timestamp = {}", recv_ts_2);  // 6
    
    println!("\n=== Observations ===");
    println!("Notice how P1's clock jumped from 3 to 6 after receiving P2's message.");
    println!("This ensures P1's subsequent events have timestamps > P2's message.");
}

Output:

=== Lamport Clock Demo ===

P1 internal event: timestamp = 1
P1 sends message with timestamp = 2
P2 receives message, new timestamp = 3
P2 internal event: timestamp = 4
P1 internal event (no sync): timestamp = 3
P2 sends message with timestamp = 5
P1 receives message, new timestamp = 6

=== Observations ===
Notice how P1's clock jumped from 3 to 6 after receiving P2's message.
This ensures P1's subsequent events have timestamps > P2's message.

B.5: Vector Clocks

The Problem with Lamport Clocks

Lamport clocks guarantee: If A → B, then LC(A) < LC(B).

But given two timestamps, we can't determine if they're causally related or concurrent.

Example:

Event X has timestamp 5
Event Y has timestamp 7

Possibilities:
- X → Y (X happened before Y)
- Y → X (Y happened before X)... wait, that's impossible if LC(X) < LC(Y)
- X || Y (concurrent)

Actually, we know Y didn't happen before X (since 7 > 5).
But we can't distinguish X → Y from X || Y!

This matters for conflict detection. If two writes are concurrent, we have a conflict. If one happened-before the other, the later one wins.

Vector Clocks: Capturing Full Causality

Idea: Instead of one counter, each process maintains a vector of counters: one for each process in the system.

If there are N processes, each maintains: [clock_for_P0, clock_for_P1, ..., clock_for_P(N-1)]

Rules:

  1. Initialize: All zeros: [0, 0, ..., 0]

  2. Before any event: Increment YOUR entry: VC[my_id] += 1

  3. When sending: Include the entire vector.

  4. When receiving vector V:

    • For each entry, take the max: VC[i] = max(VC[i], V[i])
    • Then increment your entry: VC[my_id] += 1

Let's trace through:

Process P0: VC=[0,0,0]     Process P1: VC=[0,0,0]     Process P2: VC=[0,0,0]
        │                         │                         │
[Event A]                         │                         │
VC=[1,0,0]                        │                         │
        │                         │                         │
[Send to P1]─────────────────────>│                         │
VC=[2,0,0]                        │                         │
        │                   [Receive]                       │
        │                   VC = max([0,0,0], [2,0,0])      │
        │                      = [2,0,0]                    │
        │                   Then VC[1] += 1                 │
        │                   VC = [2,1,0]                    │
        │                         │                         │
        │                   [Event B]                       │
        │                   VC=[2,2,0]                      │
        │                         │                         │
        │                   [Send to P2]───────────────────>│
        │                   VC=[2,3,0]                      │
        │                         │                   [Receive]
        │                         │                   VC=[2,3,1]
        │                         │                         │
[Event C]                         │                   [Event D]
VC=[3,0,0]                        │                   VC=[2,3,2]

Comparing Vector Clocks

Given two vectors V and W, we can determine their causal relationship:

V = W         : All entries are equal → Same logical time

V < W         : All entries of V ≤ corresponding entries of W,
                AND at least one is strictly less
                → V happened-before W

V > W         : All entries of V ≥ corresponding entries of W,
                AND at least one is strictly greater
                → W happened-before V

Otherwise     : Some entries of V < W, some entries of V > W
                → V and W are CONCURRENT (neither happened-before the other)

Examples:

[1,2,0] vs [1,3,1]
  1≤1 ✓  2≤3 ✓  0≤1 ✓
  At least one strictly less? Yes (2<3 and 0<1)
  → [1,2,0] < [1,3,1] (first happened-before second)

[2,0,0] vs [1,2,1]
  2≤1? NO (2>1)
  1≤2? Yes
  → Neither all ≤ nor all ≥
  → CONCURRENT!

[1,2,3] vs [1,2,3]
  All equal
  → EQUAL (same logical time)

B.5: Vector Clocks (continued)

Implementation in Rust

use std::cmp::Ordering;

/// The causal relationship between two vector clock timestamps.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum CausalOrdering {
    /// First timestamp happened-before the second
    Before,
    /// Second timestamp happened-before the first
    After,
    /// Neither happened-before the other (concurrent)
    Concurrent,
    /// Timestamps are identical
    Equal,
}

/// A vector clock for capturing causality in distributed systems.
/// 
/// Unlike Lamport clocks, vector clocks can determine whether two events
/// are causally related or concurrent.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct VectorClock {
    /// This node's ID (index into the vector)
    node_id: usize,
    /// The vector of logical times, one per node
    vector: Vec<u64>,
}

impl VectorClock {
    /// Create a new vector clock for a node in an N-node system.
    /// 
    /// # Arguments
    /// * `node_id` - This node's ID (0 to num_nodes-1)
    /// * `num_nodes` - Total number of nodes in the system
    pub fn new(node_id: usize, num_nodes: usize) -> Self {
        assert!(node_id < num_nodes, "node_id must be < num_nodes");
        VectorClock {
            node_id,
            vector: vec![0; num_nodes],
        }
    }
    
    /// Increment this node's entry before any local event.
    /// Returns a copy of the new vector.
    pub fn tick(&mut self) -> Vec<u64> {
        self.vector[self.node_id] += 1;
        self.vector.clone()
    }
    
    /// Get a timestamp for an outgoing message.
    /// Increments the clock and returns a copy of the vector.
    pub fn send_timestamp(&mut self) -> Vec<u64> {
        self.tick()
    }
    
    /// Update the clock when receiving a message.
    /// 
    /// Merges the received vector with our local vector (taking max of each entry),
    /// then increments our own entry.
    pub fn receive(&mut self, received: &[u64]) -> Vec<u64> {
        assert_eq!(
            received.len(),
            self.vector.len(),
            "Vector lengths must match"
        );
        
        // Merge: take max of each entry
        for i in 0..self.vector.len() {
            self.vector[i] = self.vector[i].max(received[i]);
        }
        
        // Receiving is an event, so increment our entry
        self.vector[self.node_id] += 1;
        self.vector.clone()
    }
    
    /// Compare two vector clock timestamps to determine causal ordering.
    /// 
    /// # Returns
    /// - `Before` if v1 happened-before v2
    /// - `After` if v2 happened-before v1
    /// - `Concurrent` if neither happened-before the other
    /// - `Equal` if they're the same timestamp
    pub fn compare(v1: &[u64], v2: &[u64]) -> CausalOrdering {
        assert_eq!(v1.len(), v2.len(), "Vectors must have same length");
        
        let mut has_less = false;    // Is any v1[i] < v2[i]?
        let mut has_greater = false; // Is any v1[i] > v2[i]?
        
        for (a, b) in v1.iter().zip(v2.iter()) {
            match a.cmp(b) {
                Ordering::Less => has_less = true,
                Ordering::Greater => has_greater = true,
                Ordering::Equal => {}
            }
        }
        
        match (has_less, has_greater) {
            (false, false) => CausalOrdering::Equal,      // All equal
            (true, false) => CausalOrdering::Before,      // v1 < v2
            (false, true) => CausalOrdering::After,       // v1 > v2
            (true, true) => CausalOrdering::Concurrent,   // Neither
        }
    }
    
    /// Get current vector without incrementing.
    pub fn current(&self) -> &[u64] {
        &self.vector
    }
}

fn main() {
    println!("=== Vector Clock Demo ===\n");
    
    // Three-node system
    let mut vc0 = VectorClock::new(0, 3);
    let mut vc1 = VectorClock::new(1, 3);
    let mut vc2 = VectorClock::new(2, 3);
    
    // Node 0 does an event
    let t_a = vc0.tick();
    println!("Node 0 event A: {:?}", t_a);  // [1, 0, 0]
    
    // Node 0 sends to Node 1
    let msg1 = vc0.send_timestamp();
    println!("Node 0 sends message: {:?}", msg1);  // [2, 0, 0]
    
    // Node 1 receives
    let t_b = vc1.receive(&msg1);
    println!("Node 1 receives, event B: {:?}", t_b);  // [2, 1, 0]
    
    // Node 2 does an independent event
    let t_c = vc2.tick();
    println!("Node 2 event C (independent): {:?}", t_c);  // [0, 0, 1]
    
    // Now let's compare timestamps!
    println!("\n=== Comparisons ===");
    
    println!("A {:?} vs B {:?}: {:?}", 
        t_a, t_b, VectorClock::compare(&t_a, &t_b));  // Before
    
    println!("B {:?} vs C {:?}: {:?}",
        t_b, t_c, VectorClock::compare(&t_b, &t_c));  // Concurrent!
    
    println!("A {:?} vs C {:?}: {:?}",
        t_a, t_c, VectorClock::compare(&t_a, &t_c));  // Concurrent!
    
    // Node 1 sends to Node 2
    let msg2 = vc1.send_timestamp();
    println!("\nNode 1 sends message: {:?}", msg2);  // [2, 2, 0]
    
    let t_d = vc2.receive(&msg2);
    println!("Node 2 receives, event D: {:?}", t_d);  // [2, 2, 2]
    
    // Now C and D are on the same node, so:
    println!("\nC {:?} vs D {:?}: {:?}",
        t_c, t_d, VectorClock::compare(&t_c, &t_d));  // Before (C → D)
    
    // And B → D through the message
    println!("B {:?} vs D {:?}: {:?}",
        t_b, t_d, VectorClock::compare(&t_b, &t_d));  // Before (B → D)
}

Output:

=== Vector Clock Demo ===

Node 0 event A: [1, 0, 0]
Node 0 sends message: [2, 0, 0]
Node 1 receives, event B: [2, 1, 0]
Node 2 event C (independent): [0, 0, 1]

=== Comparisons ===
A [1, 0, 0] vs B [2, 1, 0]: Before
B [2, 1, 0] vs C [0, 0, 1]: Concurrent
A [1, 0, 0] vs C [0, 0, 1]: Concurrent

Node 1 sends message: [2, 2, 0]
Node 2 receives, event D: [2, 2, 2]

C [0, 0, 1] vs D [2, 2, 2]: Before
B [2, 1, 0] vs D [2, 2, 2]: Before

Why Vector Clocks Matter

Vector clocks let us answer the crucial question: Are these events concurrent?

Use case: Distributed databases

Two servers receive writes for the same key:

Server A: SET user.name = "Alice"  with VC = [2, 1, 0]
Server B: SET user.name = "Bob"    with VC = [1, 3, 0]

Compare: [2,1,0] vs [1,3,0]

We now know there's a conflict that needs resolution (user intervention, merge function, etc.).

If instead:

Server A: SET user.name = "Alice"  with VC = [2, 1, 0]
Server B: SET user.name = "Bob"    with VC = [3, 2, 0]

Compare: [2,1,0] vs [3,2,0]

Server B's write is strictly later: no conflict, just use "Bob".


B.6: Practical Applications

Application 1: Distributed Databases

Vector clocks (or similar mechanisms) are used in real databases:

When conflicts are detected, these systems either:

Application 2: Distributed Logging

When debugging distributed systems, you need to correlate logs from multiple services.

Problem: Physical timestamps might be out of sync. Service B's log entry might show an earlier timestamp than Service A, even though A sent a message to B.

Solution: Include Lamport timestamps in logs. Sort by Lamport timestamp for a causally-consistent view.

Application 3: CRDTs (Conflict-Free Replicated Data Types)

CRDTs are data structures designed to be replicated across servers and merged without conflicts. They use ideas from vector clocks.


B.6: Practical Applications (continued)

Application 3: CRDTs (Conflict-Free Replicated Data Types) - Implementation

Example: Grow-only Counter (G-Counter)

/// A Grow-only Counter that can be incremented on any node
/// and merged without conflicts.
/// 
/// Each node only increments its own entry. Merging takes
/// the max of each entry, so no increments are ever lost.
#[derive(Debug, Clone)]
struct GCounter {
    node_id: usize,
    counts: Vec<u64>,
}

impl GCounter {
    fn new(node_id: usize, num_nodes: usize) -> Self {
        GCounter {
            node_id,
            counts: vec![0; num_nodes],
        }
    }
    
    /// Increment the counter (only our node's entry).
    fn increment(&mut self) {
        self.counts[self.node_id] += 1;
    }
    
    /// Get the total count across all nodes.
    fn value(&self) -> u64 {
        self.counts.iter().sum()
    }
    
    /// Merge with another counter (e.g., received from another node).
    /// Takes the max of each entry: no increments are lost!
    fn merge(&mut self, other: &GCounter) {
        for i in 0..self.counts.len() {
            self.counts[i] = self.counts[i].max(other.counts[i]);
        }
    }
}

fn main() {
    println!("=== G-Counter Demo ===\n");
    
    let mut counter_a = GCounter::new(0, 3);  // Node 0
    let mut counter_b = GCounter::new(1, 3);  // Node 1
    
    // Node A increments twice
    counter_a.increment();
    counter_a.increment();
    println!("Node A after 2 increments: counts={:?}, value={}", 
        counter_a.counts, counter_a.value());  // [2, 0, 0], value=2
    
    // Node B increments three times
    counter_b.increment();
    counter_b.increment();
    counter_b.increment();
    println!("Node B after 3 increments: counts={:?}, value={}", 
        counter_b.counts, counter_b.value());  // [0, 3, 0], value=3
    
    // Now they sync up (merge in both directions)
    counter_a.merge(&counter_b);
    counter_b.merge(&counter_a);
    
    println!("\nAfter merge:");
    println!("Node A: counts={:?}, value={}", 
        counter_a.counts, counter_a.value());  // [2, 3, 0], value=5
    println!("Node B: counts={:?}, value={}", 
        counter_b.counts, counter_b.value());  // [2, 3, 0], value=5
    
    println!("\nBoth nodes agree: total = 5 (no increments lost!)");
}

Output:

=== G-Counter Demo ===

Node A after 2 increments: counts=[2, 0, 0], value=2
Node B after 3 increments: counts=[0, 3, 0], value=3

After merge:
Node A: counts=[2, 3, 0], value=5
Node B: counts=[2, 3, 0], value=5

Both nodes agree: total = 5 (no increments lost!)

═══════════════════════════════════════════════════════════════

SUMMARY

═══════════════════════════════════════════════════════════════

What You've Learned

What You've Learned

Networking Fundamentals

Socket Programming

Failure Handling

The Eight Fallacies

  1. The network is reliable ❌
  2. Latency is zero ❌
  3. Bandwidth is infinite ❌
  4. The network is secure ❌
  5. Topology doesn't change ❌
  6. There is one administrator ❌
  7. Transport cost is zero ❌
  8. The network is homogeneous ❌

Concurrency

Serialization

Time and Ordering