BCP vs DRP — What Each Covers
These two plans are closely related but cover different scopes. The Business Continuity Plan (BCP) is the broader document — it covers how the entire organisation continues operating during a disruption, including non-IT functions. Staff communication, temporary office locations, manual business processes when systems are unavailable, vendor relationships, and customer communication all fall under BCP. The BCP exists to answer: "If our building is unavailable or our systems are down, how do we keep the business running?"
The Disaster Recovery Plan (DRP) is specifically about IT systems restoration. It documents the technical steps to restore systems, data, and network connectivity after a disaster. DRP defines which systems are restored first (based on business priority), from which backup location, using which recovery procedures. The DRP is a subset of the BCP — it covers IT recovery specifically within the broader business continuity framework.
On the exam: a question about "how long the organisation can operate while IT systems are down" is about BCP. A question about "restoring the database server after a ransomware attack" is about DRP. A question about "the maximum time to restore email service after an outage" is about RTO within the DRP.
RTO and RPO — The Two Critical Metrics
| Metric | Full Name | Definition | Example |
|---|---|---|---|
| RTO | Recovery Time Objective | Maximum acceptable time systems can be down | "Email must be restored within 4 hours of failure" |
| RPO | Recovery Point Objective | Maximum acceptable data loss (measured in time) | "We cannot lose more than 1 hour of transaction data" |
| MTBF | Mean Time Between Failures | Average time a system operates before failing | "The SAN has an MTBF of 50,000 hours" |
| MTTR | Mean Time To Repair | Average time to restore system after failure | "Database server MTTR is 2 hours" |
RTO determines how fast recovery must happen. An RTO of 4 hours for email means that if email goes down at 9am, it must be restored by 1pm. This drives decisions about hot vs warm vs cold standby systems — a very short RTO requires systems that can take over immediately (hot standby), while a longer RTO allows time to provision replacement systems (cold standby).
RPO determines how much data can be lost. An RPO of 1 hour means backup frequency must be at most 1 hour — if the system fails at 10am and the last backup was at 9am, that's 1 hour of potential data loss, which just meets the RPO. An RPO of zero means no data loss is acceptable, which requires synchronous replication — every write is simultaneously written to a replica before the transaction is acknowledged.
The relationship between RTO/RPO and cost is inverse: shorter RTO and RPO require more expensive infrastructure (hot standby sites, synchronous replication, redundant systems) and operational investment. Business leadership defines acceptable RTO and RPO values by weighing the cost of recovery infrastructure against the cost of extended downtime or data loss — a core business continuity governance decision.
Backup Types — Full, Incremental, Differential
The exam tests this in calculation scenarios: "A company runs a full backup Sunday, incremental backups Monday through Saturday. The system fails Saturday evening. How many backup sets are needed for recovery?" Answer: 7 — the Sunday full plus all 6 daily incrementals. If they used differential instead: only 2 — the Sunday full plus Saturday's differential (which contains all changes since Sunday).
Recovery Site Types
| Site Type | Infrastructure | Data Currency | RTO | Cost |
|---|---|---|---|---|
| Hot Site | Fully equipped, powered, and running — identical to primary | Real-time or near-real-time replication | Minutes to hours | Highest |
| Warm Site | Equipment present and powered but not fully configured — requires setup | Recent backups — hours to days old | Hours to days | Moderate |
| Cold Site | Empty space with power and connectivity — no equipment | Off-site backups must be shipped in | Days to weeks | Lowest |
| Cloud DR | On-demand provisioning — no permanent infrastructure | Configurable — near-real-time to hours | Hours (automation dependent) | Variable (pay-as-you-go) |
Hot site: A fully operational duplicate of the primary data centre, with all systems running and data replicated in real time. When a disaster occurs, traffic is redirected to the hot site and operations continue with minimal interruption. Hot sites are extremely expensive because they require maintaining a full duplicate infrastructure indefinitely, but they provide the shortest possible RTO.
Warm site: Equipment is present and powered but systems aren't fully configured or running production workloads. When a disaster occurs, administrators activate and configure the warm site systems, restore from the most recent backup, and bring services online. RTOs of several hours to a day are typical. Warm sites balance cost and recovery speed for organisations that can tolerate hours of downtime but not days.
Cold site: A facility with power, cooling, and network connectivity, but no equipment. After a disaster, the organisation ships hardware to the cold site, sets it up, restores from backup, and brings systems online. This can take days to weeks. Cold sites are appropriate for non-critical systems with long RTOs and tight budgets.
The 3-2-1 Backup Rule
The 3-2-1 backup rule is the industry-standard guideline for backup strategy: keep 3 copies of data (the production copy plus 2 backups), on 2 different types of media (disk and tape, or local and cloud), with 1 copy stored off-site. This protects against: single backup failure (3 copies), a storage type failure like NAS controller failure affecting all connected drives (2 media types), and site-level disasters like fire or flood (1 off-site copy).
Ransomware has updated this to the 3-2-1-1-0 rule: the additional "1" is an immutable or air-gapped copy that ransomware cannot reach (offline tape, immutable cloud storage with object lock), and the "0" means zero errors verified in backup restore testing. Untested backups are not reliable backups — organisations routinely discover their backup sets are corrupt or incomplete only when they need them after a disaster.