⚡ Quick Answer
A load balancer distributes traffic across multiple servers so no single server is overwhelmed. Key algorithms: Round-robin = each server in turn (simple, equal distribution). Least connections = new request goes to whichever server has the fewest active connections. IP hash = client's IP determines which server handles all their requests (sticky). L4 load balancing = routes by IP/port only (fast). L7 load balancing = routes by HTTP content — URL, headers, cookies (smart, can do SSL termination). High availability = active/active or active/passive load balancer pairs so no single point of failure.

What Is Load Balancing?

A load balancer is a device or software component that sits in front of a pool of servers and distributes incoming client requests across those servers according to a defined algorithm. Without load balancing, all traffic hits a single server — once that server is saturated, response times degrade and the application fails for users.

Load balancing solves three core problems simultaneously. Scalability: you can add more servers to the pool as demand grows without changing anything on the client side. Availability: if a server fails health checks, the load balancer stops routing to it and spreads traffic to the remaining healthy servers. Performance: requests are spread so each server operates within its optimal capacity, keeping response times low.

Load balancers perform health checks on pool members — periodically sending requests (TCP connection attempts, HTTP GET requests, or custom scripts) to verify each server is up and responding correctly. A server that fails a configurable number of consecutive health checks is marked down and removed from rotation until it recovers.

Load Balancing Algorithms

The algorithm determines which server receives each new connection. Different algorithms suit different workload types — no single algorithm is best for every scenario.

Round-Robin
Most Common
Requests are distributed to servers in sequence — Server 1, Server 2, Server 3, Server 1, Server 2… Best when all servers have equal capacity and requests have similar processing time. Simple and predictable. Weighted round-robin assigns a ratio (e.g. 2:1) so higher-capacity servers receive proportionally more requests.
Least Connections
Dynamic Workloads
Each new request goes to the server currently handling the fewest active connections. Best when requests vary significantly in processing time — long-running database queries or file uploads won't bottleneck one server while others sit idle. Weighted least connections factors in server capacity differences.
IP Hash (Source IP Affinity)
Session Persistence
The client's source IP address is hashed to always map to the same server. Provides built-in session persistence without needing application-level cookies. The same user always hits the same server unless that server goes down. Downside: uneven distribution if many clients share a NAT IP.
Random
Simple
Requests are forwarded to a randomly selected server. Statistically approaches round-robin at high request volumes. Good for large, homogeneous server pools where randomness provides natural distribution without tracking overhead.
Least Response Time
Latency-Sensitive
Combines least connections with lowest average response time — the server that responds fastest and has fewest connections wins. Best for latency-sensitive applications like real-time APIs or trading platforms. Requires the load balancer to actively measure server response times.
Resource-Based (Adaptive)
Advanced
Load balancer agents on each server report real-time CPU and memory utilization. The load balancer routes to whichever server has the most available resources. Most intelligent algorithm but requires additional agents and overhead. Used in enterprise ADC (Application Delivery Controller) platforms.

Layer 4 vs Layer 7 Load Balancing

The distinction between Layer 4 and Layer 7 is one of the most common exam topics. The key difference is what information the load balancer can see and act on.

Layer 4 — Transport Layer
Operates on IP address + TCP/UDP port
Sees: Source/destination IP and port only
Cannot see: HTTP headers, URLs, cookies, or payload content
Speed: Extremely fast — minimal processing overhead
SSL: Passes through encrypted traffic unchanged (no termination)
Use case: TCP load balancing, UDP (DNS, game servers), very high throughput scenarios
Example: AWS Network Load Balancer (NLB)
Layer 7 — Application Layer
Operates on full HTTP/HTTPS content
Sees: URL paths, HTTP headers, host headers, cookies, request body
Can do: Content-based routing, A/B testing, canary deployments
Speed: Slower than L4 (must inspect content) but negligible for most apps
SSL: Terminates SSL/TLS — decrypts, inspects, re-encrypts (or sends plain HTTP to backend)
Use case: Web applications, microservices routing, WAF integration
Example: AWS Application Load Balancer (ALB), NGINX, HAProxy
📌 Content-Based Routing Example (L7)

A single L7 load balancer can route /api/* requests to the API server pool, /static/* requests to a CDN origin pool, and /admin/* requests to a restricted management pool — all based on the URL path in the HTTP request. An L4 load balancer cannot do this because it never sees the URL.

Session Persistence (Sticky Sessions)

Many web applications store session state locally on the server — user login tokens, shopping cart data, form progress. If a user's subsequent requests are routed to a different server, that server has no knowledge of their session and the user appears logged out or loses their cart.

Session persistence (also called sticky sessions or session affinity) solves this by ensuring all requests from the same client always go to the same server. There are two main methods:

Method How It Works Pros Cons
Cookie-based Load balancer inserts a cookie identifying which server handled the first request. Subsequent requests include the cookie, so the LB always routes to the same server. Works behind NAT; precise per-user targeting Requires L7 LB; cookies can be cleared by user
Source IP hash Client's source IP is hashed to a consistent server. Same IP always maps to the same server. No application changes needed; works at L4 NAT breaks it — all users behind one NAT IP hit the same server; poor distribution
Application-managed Session state is stored in a shared database or cache (Redis, Memcached) accessible to all servers. Any server can handle any request. True scalability; no single-server dependency Requires application architecture changes; shared cache is a new component to manage
🎯 Exam Tip — Sticky Sessions Trade-off

The exam may present a scenario where users are randomly losing their sessions. This is the classic symptom of a stateful application running behind a load balancer without session persistence configured. The fix is either: (1) enable sticky sessions on the load balancer, or (2) refactor the application to use a shared session store.

Also remember: if a sticky server fails, that client's session is lost regardless of persistence settings — the session data only existed on that server.

High Availability — Active/Active vs Active/Passive

The load balancer itself can become a single point of failure. If the load balancer goes down, all traffic is lost. High availability (HA) configurations deploy load balancers in pairs.

Active/Active
Both load balancers are running simultaneously and each handles a portion of traffic. A shared virtual IP address (VIP) floats between them via protocols like VRRP (Virtual Router Redundancy Protocol) or HSRP. If one fails, the other takes over the VIP and handles all traffic. Better resource utilisation since both devices are actively working.
VRRP / HSRPShared VIPFull utilisation
🛡️
Active/Passive (Standby)
One load balancer handles all traffic; the second is on standby and does nothing until the primary fails. On failure, the standby detects the outage and assumes the VIP. Simpler to configure but the standby device is idle hardware — a cost without immediate return. Failover is typically sub-second with modern heartbeat protocols.
HeartbeatVIP failoverHot standby

Load Balancer Health Checks

Health checks are what enable automatic failover. The load balancer continuously probes each server in the pool at a configurable interval. Common health check types:

Health Check Type What It Tests Use When
TCP check Opens a TCP connection to the server's port. If the connection succeeds, the server is considered up. Non-HTTP services; quick L4-level liveness check
HTTP/HTTPS check Sends an HTTP GET request to a specific path (e.g. /health) and expects a 200 OK response. Web applications; ensures the app is actually responding, not just the TCP stack
Custom script Runs a script that performs a deeper check — e.g. verifies database connectivity, checks disk space, or tests a critical business function. Critical applications where basic HTTP response isn't sufficient evidence of health
SSL/TLS check Performs an HTTPS health check and validates the certificate. Useful for detecting expired certificates before they cause user-facing errors. HTTPS endpoints; certificate monitoring

SSL Termination and SSL Passthrough

HTTPS traffic requires careful handling at the load balancer. Two approaches exist, each with different security and performance implications.

🔐 SSL Termination

The load balancer decrypts HTTPS traffic, inspects the plain HTTP content (enabling L7 routing, cookie injection, WAF), and then forwards to backend servers either as plain HTTP or re-encrypted HTTPS. This offloads CPU-intensive TLS handshake processing from backend servers. The traffic between load balancer and backend servers is unencrypted — only acceptable if that network segment is trusted (e.g., private VPC).

Also called: SSL offloading. The load balancer holds the TLS certificate and private key.

🔒 SSL Passthrough

The load balancer passes encrypted traffic straight to backend servers without decrypting it. Backend servers each hold the TLS certificate and handle decryption themselves. This provides end-to-end encryption and hides traffic from the load balancer — but the LB cannot do L7 content inspection, cookie insertion, or WAF filtering. This is an L4-only operation.

🎯 Exam Tip — Key Terms to Know

VIP (Virtual IP address) — the IP address clients connect to; floats between load balancers in HA configurations.

Pool / server farm — the group of backend servers the load balancer distributes traffic to.

VRRP — Virtual Router Redundancy Protocol; used between load balancer pairs to manage the VIP.

ADC (Application Delivery Controller) — an enterprise load balancer with advanced features: SSL termination, WAF, compression, caching (e.g. F5 BIG-IP, Citrix ADC).

Exam Scenarios

Scenario 1: Users report they are randomly logged out when browsing an e-commerce site. The site uses a three-server web farm behind a load balancer. Answer: The load balancer does not have session persistence configured. Enable sticky sessions (cookie-based affinity) on the load balancer, or migrate session storage to a shared Redis cache.
Scenario 2: A company wants to route requests to /api/ to a separate pool of API servers while all other requests go to the main web server pool. Answer: This requires a Layer 7 (application layer) load balancer with content-based routing rules. An L4 load balancer cannot inspect URLs.
Scenario 3: A load balancer fails and all web traffic goes down. How should the architecture be improved? Answer: Deploy a second load balancer in an active/active or active/passive HA pair using VRRP to share a virtual IP address. If the primary fails, the secondary takes over the VIP automatically.
Scenario 4: One server in a four-server pool is significantly more powerful than the others. Traffic should reflect this difference. Answer: Use weighted round-robin (assign a higher weight to the more powerful server) or weighted least connections so the powerful server receives proportionally more requests.
Scenario 5: A company wants to offload SSL processing from backend web servers to reduce their CPU load. Answer: Configure SSL termination (SSL offloading) on the load balancer. The load balancer decrypts HTTPS from clients and forwards plain HTTP (or re-encrypted HTTPS) to backend servers, centralising all TLS processing at the load balancer.

Nail your Network+ exam

Free cheat sheets, study guides, and practice scenarios.

View Cheat Sheet →

Related Topics