What Is Load Balancing?
A load balancer is a device or software component that sits in front of a pool of servers and distributes incoming client requests across those servers according to a defined algorithm. Without load balancing, all traffic hits a single server — once that server is saturated, response times degrade and the application fails for users.
Load balancing solves three core problems simultaneously. Scalability: you can add more servers to the pool as demand grows without changing anything on the client side. Availability: if a server fails health checks, the load balancer stops routing to it and spreads traffic to the remaining healthy servers. Performance: requests are spread so each server operates within its optimal capacity, keeping response times low.
Load balancers perform health checks on pool members — periodically sending requests (TCP connection attempts, HTTP GET requests, or custom scripts) to verify each server is up and responding correctly. A server that fails a configurable number of consecutive health checks is marked down and removed from rotation until it recovers.
Load Balancing Algorithms
The algorithm determines which server receives each new connection. Different algorithms suit different workload types — no single algorithm is best for every scenario.
Layer 4 vs Layer 7 Load Balancing
The distinction between Layer 4 and Layer 7 is one of the most common exam topics. The key difference is what information the load balancer can see and act on.
A single L7 load balancer can route /api/* requests to the API server pool, /static/* requests to a CDN origin pool, and /admin/* requests to a restricted management pool — all based on the URL path in the HTTP request. An L4 load balancer cannot do this because it never sees the URL.
Session Persistence (Sticky Sessions)
Many web applications store session state locally on the server — user login tokens, shopping cart data, form progress. If a user's subsequent requests are routed to a different server, that server has no knowledge of their session and the user appears logged out or loses their cart.
Session persistence (also called sticky sessions or session affinity) solves this by ensuring all requests from the same client always go to the same server. There are two main methods:
| Method | How It Works | Pros | Cons |
|---|---|---|---|
| Cookie-based | Load balancer inserts a cookie identifying which server handled the first request. Subsequent requests include the cookie, so the LB always routes to the same server. | Works behind NAT; precise per-user targeting | Requires L7 LB; cookies can be cleared by user |
| Source IP hash | Client's source IP is hashed to a consistent server. Same IP always maps to the same server. | No application changes needed; works at L4 | NAT breaks it — all users behind one NAT IP hit the same server; poor distribution |
| Application-managed | Session state is stored in a shared database or cache (Redis, Memcached) accessible to all servers. Any server can handle any request. | True scalability; no single-server dependency | Requires application architecture changes; shared cache is a new component to manage |
The exam may present a scenario where users are randomly losing their sessions. This is the classic symptom of a stateful application running behind a load balancer without session persistence configured. The fix is either: (1) enable sticky sessions on the load balancer, or (2) refactor the application to use a shared session store.
Also remember: if a sticky server fails, that client's session is lost regardless of persistence settings — the session data only existed on that server.
High Availability — Active/Active vs Active/Passive
The load balancer itself can become a single point of failure. If the load balancer goes down, all traffic is lost. High availability (HA) configurations deploy load balancers in pairs.
Load Balancer Health Checks
Health checks are what enable automatic failover. The load balancer continuously probes each server in the pool at a configurable interval. Common health check types:
| Health Check Type | What It Tests | Use When |
|---|---|---|
| TCP check | Opens a TCP connection to the server's port. If the connection succeeds, the server is considered up. | Non-HTTP services; quick L4-level liveness check |
| HTTP/HTTPS check | Sends an HTTP GET request to a specific path (e.g. /health) and expects a 200 OK response. | Web applications; ensures the app is actually responding, not just the TCP stack |
| Custom script | Runs a script that performs a deeper check — e.g. verifies database connectivity, checks disk space, or tests a critical business function. | Critical applications where basic HTTP response isn't sufficient evidence of health |
| SSL/TLS check | Performs an HTTPS health check and validates the certificate. Useful for detecting expired certificates before they cause user-facing errors. | HTTPS endpoints; certificate monitoring |
SSL Termination and SSL Passthrough
HTTPS traffic requires careful handling at the load balancer. Two approaches exist, each with different security and performance implications.
The load balancer decrypts HTTPS traffic, inspects the plain HTTP content (enabling L7 routing, cookie injection, WAF), and then forwards to backend servers either as plain HTTP or re-encrypted HTTPS. This offloads CPU-intensive TLS handshake processing from backend servers. The traffic between load balancer and backend servers is unencrypted — only acceptable if that network segment is trusted (e.g., private VPC).
Also called: SSL offloading. The load balancer holds the TLS certificate and private key.
The load balancer passes encrypted traffic straight to backend servers without decrypting it. Backend servers each hold the TLS certificate and handle decryption themselves. This provides end-to-end encryption and hides traffic from the load balancer — but the LB cannot do L7 content inspection, cookie insertion, or WAF filtering. This is an L4-only operation.
VIP (Virtual IP address) — the IP address clients connect to; floats between load balancers in HA configurations.
Pool / server farm — the group of backend servers the load balancer distributes traffic to.
VRRP — Virtual Router Redundancy Protocol; used between load balancer pairs to manage the VIP.
ADC (Application Delivery Controller) — an enterprise load balancer with advanced features: SSL termination, WAF, compression, caching (e.g. F5 BIG-IP, Citrix ADC).
Exam Scenarios
Load Balancer Deployment Architectures
Load balancers can be deployed in several architectural positions within a network, and the choice affects both functionality and security posture.
The most common deployment is as an external (internet-facing) load balancer that sits between the internet and the web server pool. In this position, the load balancer terminates inbound HTTPS connections, performs SSL offloading, and forwards requests to backend servers in the DMZ or internal network. This centralizes all TLS certificate management at the load balancer and prevents direct internet access to the backend servers — clients on the internet only communicate with the load balancer's VIP.
An internal load balancer sits inside the network and distributes traffic between internal application tiers — for example, balancing requests from web servers to application servers, or from application servers to database read replicas. Internal load balancing improves the scalability of multi-tier architectures without exposing the load balancer to the internet. In cloud environments, internal load balancers are a standard pattern for microservices — each service layer has its own internal load balancer so individual services can scale independently.
In a two-tier load balancing architecture (common in large-scale deployments), a first-tier global load balancer or GSLB routes traffic between data centers, and second-tier local load balancers within each data center distribute traffic across individual servers. This provides both geographic redundancy (handled by tier 1) and within-datacenter scaling (handled by tier 2). Cloud providers implement this automatically — AWS ALB + Route 53 or Azure Application Gateway + Azure Front Door are common implementations of two-tier architectures.
Global Server Load Balancing (GSLB)
Global Server Load Balancing (GSLB) extends load balancing beyond a single data center to distribute traffic across multiple geographically dispersed data centers. While a standard load balancer operates within one site, GSLB uses DNS to route users to the optimal data center based on geographic proximity, health, and load.
When a user resolves a domain like www.example.com, instead of getting a single IP address back from DNS, the GSLB-aware DNS server returns the IP address of the data center that is healthiest, closest to the user, and least loaded. If the US East data center fails all health checks, the GSLB DNS stops returning its IP — traffic is automatically directed to US West or Europe instead, achieving geographic failover without user intervention. Major CDN providers (Cloudflare, Akamai, AWS Route 53) implement GSLB through their anycast routing and traffic management features.
For exam scenarios: if the question asks how a company can direct users to the nearest data center automatically, or how traffic can fail over between geographically distributed data centers, the answer involves GSLB or DNS-based traffic management. The key differentiator from standard load balancing is that GSLB operates at the DNS level and spans multiple geographic locations.
Load Balancing in Cloud Environments
All major cloud providers offer managed load balancing services that eliminate the need to deploy and manage load balancer appliances. Understanding cloud load balancing options is increasingly relevant for Network+ and is directly tested on cloud certification paths.
AWS offers three load balancer types: ALB (Application Load Balancer) for Layer 7 HTTP/HTTPS traffic with content-based routing and WebSocket support; NLB (Network Load Balancer) for Layer 4 TCP/UDP traffic with ultra-low latency and static IP addresses; and GWLB (Gateway Load Balancer) for routing traffic through third-party security appliances. The ALB is the most commonly used for web applications.
Azure provides Azure Load Balancer (Layer 4, internal and public) and Azure Application Gateway (Layer 7 with WAF capabilities, similar to AWS ALB). Azure Front Door adds global GSLB, CDN, and WAF in a unified service. Google Cloud similarly offers Cloud Load Balancing with multiple tiers covering Layer 4 and Layer 7, internal and external traffic.
In cloud environments, load balancers integrate with auto-scaling groups — when the load balancer's health checks show that backend instances are overloaded (high CPU, high connection count), the auto-scaling group automatically launches additional instances and registers them with the load balancer. When load decreases, instances are terminated. This creates truly elastic capacity that scales with demand without manual intervention — a key cloud advantage over on-premises deployments.
Load Balancer Security Considerations
Load balancers are not just traffic distribution devices — they occupy a critical position in the network architecture and have important security implications.
SSL/TLS certificate management is centralized at the load balancer when SSL termination is used. This means the load balancer holds the private key for the domain's certificate. If the load balancer is compromised, the private key is exposed. Hardware Security Modules (HSMs) can store private keys in tamper-resistant hardware that cannot be exported, protecting the key even if the load balancer software is compromised.
DDoS protection is often implemented at the load balancer layer. Volumetric DDoS attacks (floods of traffic) can be absorbed by cloud-based load balancers with near-unlimited scale. Protocol attacks (SYN floods, ICMP floods) can be mitigated by the load balancer's TCP proxy behavior — the load balancer completes the three-way handshake on behalf of backend servers, absorbing SYN floods before they reach the servers. Application-layer DDoS (HTTP floods) requires rate limiting and WAF rules configured at the load balancer.
IP address exposure: backend server IPs are hidden behind the load balancer's VIP (Virtual IP), which is a security benefit. However, applications that log client IP addresses will see the load balancer's IP rather than the real client IP unless the load balancer is configured to pass the original IP via the X-Forwarded-For HTTP header. Proper configuration of XFF header handling is important for security logging, access control, and geographic restriction functionality.
Nail your Network+ exam
Free cheat sheets, study guides, and practice scenarios.