What is the difference between Layer 4 and Layer 7 load balancing?

Layer 4 load balancing operates at the transport layer and makes routing decisions based on IP address and TCP/UDP port only — it cannot inspect the content of requests. It is fast and efficient but less intelligent. Layer 7 load balancing operates at the application layer and can inspect the actual HTTP content — URL paths, headers, cookies, hostnames — to make smarter routing decisions, such as sending /api requests to API servers and /static requests to a CDN origin. L7 load balancers can also terminate SSL, perform content-based routing, and act as an application firewall.

What is session persistence in load balancing?

Session persistence (also called sticky sessions or session affinity) ensures that all requests from the same client are always sent to the same server during a session. This is important for stateful applications that store session data locally on the server (shopping carts, login sessions). It is typically implemented using a cookie inserted by the load balancer, or by hashing the client's source IP address. The downside is that if the target server fails, that client's session data is lost.

Load Balancing Explained — Algorithms, L4 vs L7 & HA

Q: What is load balancing?

Load balancing distributes incoming network traffic across multiple servers to prevent any single server from becoming a bottleneck. A load balancer sits in front of a server pool and forwards each new request to a server based on an algorithm (round-robin, least connections, IP hash, etc.). This improves application availability, scalability, and performance. If a server fails, the load balancer stops sending traffic to it — providing high availability.

⚡ Quick Answer

A load balancer distributes traffic across multiple servers so no single server is overwhelmed. Key algorithms: Round-robin = each server in turn (simple, equal distribution). Least connections = new request goes to whichever server has the fewest active connections. IP hash = client's IP determines which server handles all their requests (sticky). L4 load balancing = routes by IP/port only (fast). L7 load balancing = routes by HTTP content — URL, headers, cookies (smart, can do SSL termination). High availability = active/active or active/passive load balancer pairs so no single point of failure.

What Is Load Balancing?

A load balancer is a device or software component that sits in front of a pool of servers and distributes incoming client requests across those servers according to a defined algorithm. Without load balancing, all traffic hits a single server — once that server is saturated, response times degrade and the application fails for users.

Load balancing solves three core problems simultaneously. Scalability: you can add more servers to the pool as demand grows without changing anything on the client side. Availability: if a server fails health checks, the load balancer stops routing to it and spreads traffic to the remaining healthy servers. Performance: requests are spread so each server operates within its optimal capacity, keeping response times low.

Load balancers perform health checks on pool members — periodically sending requests (TCP connection attempts, HTTP GET requests, or custom scripts) to verify each server is up and responding correctly. A server that fails a configurable number of consecutive health checks is marked down and removed from rotation until it recovers.

Load Balancing Algorithms

The algorithm determines which server receives each new connection. Different algorithms suit different workload types — no single algorithm is best for every scenario.

Round-Robin

Most Common

Requests are distributed to servers in sequence — Server 1, Server 2, Server 3, Server 1, Server 2… Best when all servers have equal capacity and requests have similar processing time. Simple and predictable. Weighted round-robin assigns a ratio (e.g. 2:1) so higher-capacity servers receive proportionally more requests.

Least Connections

Dynamic Workloads

Each new request goes to the server currently handling the fewest active connections. Best when requests vary significantly in processing time — long-running database queries or file uploads won't bottleneck one server while others sit idle. Weighted least connections factors in server capacity differences.

IP Hash (Source IP Affinity)

Session Persistence

The client's source IP address is hashed to always map to the same server. Provides built-in session persistence without needing application-level cookies. The same user always hits the same server unless that server goes down. Downside: uneven distribution if many clients share a NAT IP.

Random

Simple

Requests are forwarded to a randomly selected server. Statistically approaches round-robin at high request volumes. Good for large, homogeneous server pools where randomness provides natural distribution without tracking overhead.

Least Response Time

Latency-Sensitive

Combines least connections with lowest average response time — the server that responds fastest and has fewest connections wins. Best for latency-sensitive applications like real-time APIs or trading platforms. Requires the load balancer to actively measure server response times.

Resource-Based (Adaptive)

Advanced

Load balancer agents on each server report real-time CPU and memory utilization. The load balancer routes to whichever server has the most available resources. Most intelligent algorithm but requires additional agents and overhead. Used in enterprise ADC (Application Delivery Controller) platforms.

Layer 4 vs Layer 7 Load Balancing

The distinction between Layer 4 and Layer 7 is one of the most common exam topics. The key difference is what information the load balancer can see and act on.

Layer 4 — Transport Layer

Operates on IP address + TCP/UDP port

Sees: Source/destination IP and port only

Cannot see: HTTP headers, URLs, cookies, or payload content

Speed: Extremely fast — minimal processing overhead

SSL: Passes through encrypted traffic unchanged (no termination)

Use case: TCP load balancing, UDP (DNS, game servers), very high throughput scenarios

Example: AWS Network Load Balancer (NLB)

Layer 7 — Application Layer

Operates on full HTTP/HTTPS content

Sees: URL paths, HTTP headers, host headers, cookies, request body

Can do: Content-based routing, A/B testing, canary deployments

Speed: Slower than L4 (must inspect content) but negligible for most apps

SSL: Terminates SSL/TLS — decrypts, inspects, re-encrypts (or sends plain HTTP to backend)

Use case: Web applications, microservices routing, WAF integration

Example: AWS Application Load Balancer (ALB), NGINX, HAProxy

📌 Content-Based Routing Example (L7)

A single L7 load balancer can route /api/* requests to the API server pool, /static/* requests to a CDN origin pool, and /admin/* requests to a restricted management pool — all based on the URL path in the HTTP request. An L4 load balancer cannot do this because it never sees the URL.

Session Persistence (Sticky Sessions)

Many web applications store session state locally on the server — user login tokens, shopping cart data, form progress. If a user's subsequent requests are routed to a different server, that server has no knowledge of their session and the user appears logged out or loses their cart.

Session persistence (also called sticky sessions or session affinity) solves this by ensuring all requests from the same client always go to the same server. There are two main methods:

Method	How It Works	Pros	Cons
Cookie-based	Load balancer inserts a cookie identifying which server handled the first request. Subsequent requests include the cookie, so the LB always routes to the same server.	Works behind NAT; precise per-user targeting	Requires L7 LB; cookies can be cleared by user
Source IP hash	Client's source IP is hashed to a consistent server. Same IP always maps to the same server.	No application changes needed; works at L4	NAT breaks it — all users behind one NAT IP hit the same server; poor distribution
Application-managed	Session state is stored in a shared database or cache (Redis, Memcached) accessible to all servers. Any server can handle any request.	True scalability; no single-server dependency	Requires application architecture changes; shared cache is a new component to manage

🎯 Exam Tip — Sticky Sessions Trade-off

The exam may present a scenario where users are randomly losing their sessions. This is the classic symptom of a stateful application running behind a load balancer without session persistence configured. The fix is either: (1) enable sticky sessions on the load balancer, or (2) refactor the application to use a shared session store.

Also remember: if a sticky server fails, that client's session is lost regardless of persistence settings — the session data only existed on that server.

High Availability — Active/Active vs Active/Passive

The load balancer itself can become a single point of failure. If the load balancer goes down, all traffic is lost. High availability (HA) configurations deploy load balancers in pairs.

⚡

Active/Active

Both load balancers are running simultaneously and each handles a portion of traffic. A shared virtual IP address (VIP) floats between them via protocols like VRRP (Virtual Router Redundancy Protocol) or HSRP. If one fails, the other takes over the VIP and handles all traffic. Better resource utilisation since both devices are actively working.

VRRP / HSRPShared VIPFull utilisation

🛡️

Active/Passive (Standby)

One load balancer handles all traffic; the second is on standby and does nothing until the primary fails. On failure, the standby detects the outage and assumes the VIP. Simpler to configure but the standby device is idle hardware — a cost without immediate return. Failover is typically sub-second with modern heartbeat protocols.

HeartbeatVIP failoverHot standby

Load Balancer Health Checks

Health checks are what enable automatic failover. The load balancer continuously probes each server in the pool at a configurable interval. Common health check types:

Health Check Type	What It Tests	Use When
TCP check	Opens a TCP connection to the server's port. If the connection succeeds, the server is considered up.	Non-HTTP services; quick L4-level liveness check
HTTP/HTTPS check	Sends an HTTP GET request to a specific path (e.g. /health) and expects a 200 OK response.	Web applications; ensures the app is actually responding, not just the TCP stack
Custom script	Runs a script that performs a deeper check — e.g. verifies database connectivity, checks disk space, or tests a critical business function.	Critical applications where basic HTTP response isn't sufficient evidence of health
SSL/TLS check	Performs an HTTPS health check and validates the certificate. Useful for detecting expired certificates before they cause user-facing errors.	HTTPS endpoints; certificate monitoring

SSL Termination and SSL Passthrough

HTTPS traffic requires careful handling at the load balancer. Two approaches exist, each with different security and performance implications.

🔐 SSL Termination

The load balancer decrypts HTTPS traffic, inspects the plain HTTP content (enabling L7 routing, cookie injection, WAF), and then forwards to backend servers either as plain HTTP or re-encrypted HTTPS. This offloads CPU-intensive TLS handshake processing from backend servers. The traffic between load balancer and backend servers is unencrypted — only acceptable if that network segment is trusted (e.g., private VPC).

Also called: SSL offloading. The load balancer holds the TLS certificate and private key.

🔒 SSL Passthrough

The load balancer passes encrypted traffic straight to backend servers without decrypting it. Backend servers each hold the TLS certificate and handle decryption themselves. This provides end-to-end encryption and hides traffic from the load balancer — but the LB cannot do L7 content inspection, cookie insertion, or WAF filtering. This is an L4-only operation.

🎯 Exam Tip — Key Terms to Know

VIP (Virtual IP address) — the IP address clients connect to; floats between load balancers in HA configurations.

Pool / server farm — the group of backend servers the load balancer distributes traffic to.

VRRP — Virtual Router Redundancy Protocol; used between load balancer pairs to manage the VIP.

ADC (Application Delivery Controller) — an enterprise load balancer with advanced features: SSL termination, WAF, compression, caching (e.g. F5 BIG-IP, Citrix ADC).

Exam Scenarios

Scenario 1: Users report they are randomly logged out when browsing an e-commerce site. The site uses a three-server web farm behind a load balancer. Answer: The load balancer does not have session persistence configured. Enable sticky sessions (cookie-based affinity) on the load balancer, or migrate session storage to a shared Redis cache.

Scenario 2: A company wants to route requests to /api/ to a separate pool of API servers while all other requests go to the main web server pool. Answer: This requires a Layer 7 (application layer) load balancer with content-based routing rules. An L4 load balancer cannot inspect URLs.

Scenario 3: A load balancer fails and all web traffic goes down. How should the architecture be improved? Answer: Deploy a second load balancer in an active/active or active/passive HA pair using VRRP to share a virtual IP address. If the primary fails, the secondary takes over the VIP automatically.

Scenario 4: One server in a four-server pool is significantly more powerful than the others. Traffic should reflect this difference. Answer: Use weighted round-robin (assign a higher weight to the more powerful server) or weighted least connections so the powerful server receives proportionally more requests.

Scenario 5: A company wants to offload SSL processing from backend web servers to reduce their CPU load. Answer: Configure SSL termination (SSL offloading) on the load balancer. The load balancer decrypts HTTPS from clients and forwards plain HTTP (or re-encrypted HTTPS) to backend servers, centralising all TLS processing at the load balancer.

Load Balancer Deployment Architectures

Load balancers can be deployed in several architectural positions within a network, and the choice affects both functionality and security posture.

The most common deployment is as an external (internet-facing) load balancer that sits between the internet and the web server pool. In this position, the load balancer terminates inbound HTTPS connections, performs SSL offloading, and forwards requests to backend servers in the DMZ or internal network. This centralizes all TLS certificate management at the load balancer and prevents direct internet access to the backend servers — clients on the internet only communicate with the load balancer's VIP.

An internal load balancer sits inside the network and distributes traffic between internal application tiers — for example, balancing requests from web servers to application servers, or from application servers to database read replicas. Internal load balancing improves the scalability of multi-tier architectures without exposing the load balancer to the internet. In cloud environments, internal load balancers are a standard pattern for microservices — each service layer has its own internal load balancer so individual services can scale independently.

In a two-tier load balancing architecture (common in large-scale deployments), a first-tier global load balancer or GSLB routes traffic between data centers, and second-tier local load balancers within each data center distribute traffic across individual servers. This provides both geographic redundancy (handled by tier 1) and within-datacenter scaling (handled by tier 2). Cloud providers implement this automatically — AWS ALB + Route 53 or Azure Application Gateway + Azure Front Door are common implementations of two-tier architectures.

Global Server Load Balancing (GSLB)

Global Server Load Balancing (GSLB) extends load balancing beyond a single data center to distribute traffic across multiple geographically dispersed data centers. While a standard load balancer operates within one site, GSLB uses DNS to route users to the optimal data center based on geographic proximity, health, and load.

When a user resolves a domain like www.example.com, instead of getting a single IP address back from DNS, the GSLB-aware DNS server returns the IP address of the data center that is healthiest, closest to the user, and least loaded. If the US East data center fails all health checks, the GSLB DNS stops returning its IP — traffic is automatically directed to US West or Europe instead, achieving geographic failover without user intervention. Major CDN providers (Cloudflare, Akamai, AWS Route 53) implement GSLB through their anycast routing and traffic management features.

For exam scenarios: if the question asks how a company can direct users to the nearest data center automatically, or how traffic can fail over between geographically distributed data centers, the answer involves GSLB or DNS-based traffic management. The key differentiator from standard load balancing is that GSLB operates at the DNS level and spans multiple geographic locations.

Load Balancing in Cloud Environments

All major cloud providers offer managed load balancing services that eliminate the need to deploy and manage load balancer appliances. Understanding cloud load balancing options is increasingly relevant for Network+ and is directly tested on cloud certification paths.

AWS offers three load balancer types: ALB (Application Load Balancer) for Layer 7 HTTP/HTTPS traffic with content-based routing and WebSocket support; NLB (Network Load Balancer) for Layer 4 TCP/UDP traffic with ultra-low latency and static IP addresses; and GWLB (Gateway Load Balancer) for routing traffic through third-party security appliances. The ALB is the most commonly used for web applications.

Azure provides Azure Load Balancer (Layer 4, internal and public) and Azure Application Gateway (Layer 7 with WAF capabilities, similar to AWS ALB). Azure Front Door adds global GSLB, CDN, and WAF in a unified service. Google Cloud similarly offers Cloud Load Balancing with multiple tiers covering Layer 4 and Layer 7, internal and external traffic.

In cloud environments, load balancers integrate with auto-scaling groups — when the load balancer's health checks show that backend instances are overloaded (high CPU, high connection count), the auto-scaling group automatically launches additional instances and registers them with the load balancer. When load decreases, instances are terminated. This creates truly elastic capacity that scales with demand without manual intervention — a key cloud advantage over on-premises deployments.

Load Balancer Security Considerations

Load balancers are not just traffic distribution devices — they occupy a critical position in the network architecture and have important security implications.

SSL/TLS certificate management is centralized at the load balancer when SSL termination is used. This means the load balancer holds the private key for the domain's certificate. If the load balancer is compromised, the private key is exposed. Hardware Security Modules (HSMs) can store private keys in tamper-resistant hardware that cannot be exported, protecting the key even if the load balancer software is compromised.

DDoS protection is often implemented at the load balancer layer. Volumetric DDoS attacks (floods of traffic) can be absorbed by cloud-based load balancers with near-unlimited scale. Protocol attacks (SYN floods, ICMP floods) can be mitigated by the load balancer's TCP proxy behavior — the load balancer completes the three-way handshake on behalf of backend servers, absorbing SYN floods before they reach the servers. Application-layer DDoS (HTTP floods) requires rate limiting and WAF rules configured at the load balancer.

IP address exposure: backend server IPs are hidden behind the load balancer's VIP (Virtual IP), which is a security benefit. However, applications that log client IP addresses will see the load balancer's IP rather than the real client IP unless the load balancer is configured to pass the original IP via the X-Forwarded-For HTTP header. Proper configuration of XFF header handling is important for security logging, access control, and geographic restriction functionality.

Nail your Network+ exam

Free cheat sheets, study guides, and practice scenarios.

View Cheat Sheet →