The Backup You Have But Probably Can't Restore

Backup is one of those operational hygiene things that everyone agrees is important and almost nobody actually does well. The pattern at SMBs is depressingly consistent: backup tool is configured. Backup tool sends "Backup successful" emails every day. Months or years pass. Disaster strikes. Someone tries to restore from backup. The backup is unusable.

Each variant of "the backup is unusable" is a separate small disaster. Backup was happening but to a location nobody has access to anymore. Backup was happening but only the file system, not the database. Backup was happening but tested against a corrupt copy that nobody noticed. Backup was happening but the restore process requires a 6-hour download from a service that's no longer paid for.

The fix isn't about backup configuration — most SMBs have that. It's about backup verification: regularly proving that the backup can actually be used to restore. This article is about the verification practice and the backup architecture that survives the verification test.

Why "backup successful" emails are misleading

A backup tool sends a "Backup successful" email when the backup operation completed without errors. That's a low bar. It does not mean:

The data in the backup is consistent (databases backed up while changing can be inconsistent)
The backup contains everything you need to actually restore
The backup format can be read by current versions of the restore tool
The destination still has the backup six months later
You have credentials to access the destination
The restore process actually works end-to-end

Each of these failure modes is common. Some real examples I've encountered at SMB clients:

Example 1: A SaaS company had been running daily database backups to S3 for three years. "Backup successful" every day. When a developer accidentally dropped a critical table, they tried to restore. The backups were there — but the database had been migrated to a newer version 18 months earlier, and the backup tool had been silently capturing the old (now-empty) database the entire time.

Example 2: An e-commerce store backed up nightly to a NAS in their office. Office got broken into; thieves took the NAS. Their "offsite backup" was actually only on-site.

Example 3: A consulting firm's accounting system backed up to OneDrive every night. The OneDrive account belonged to a former employee. When she left, the account was deleted by IT housekeeping. Backups had been failing silently for 4 months before someone noticed.

Example 4: A B2B SaaS had backups running and verified — they could see them in the storage destination. But the restore script hadn't been tested in two years, and when needed, it failed because of a Postgres version mismatch the team didn't know about.

In every case, "Backup successful" emails were being received the whole time. The verification gap is the disaster.

The quarterly restore test

The single most-leveraged backup practice: once per quarter, attempt to restore from a recent backup, end-to-end, in a test environment.

What "end-to-end" means:

Pull a recent backup from its destination.
Restore it to a new test environment (a sandbox database, a test cloud instance).
Verify the restored data matches expectations.
Document how long the restore took and any issues encountered.
Update runbooks based on what you learned.

This takes 2–4 hours per quarter. It catches every failure mode listed above before they become disasters. Skip it and you're operating on hope.

The 3-2-1 backup rule

The standard reference for backup architecture. Three copies of every important piece of data:

3 copies total: production + 2 backups
2 different storage media or locations: not all backups in one place
1 offsite copy: protected from physical disasters at the primary location

For SMBs, the practical implementation:

Production: where the data lives normally (your database, your file server, your SaaS application).

Backup 1: automated daily backup to cloud storage (S3, Backblaze B2, Wasabi). Different from production but in the cloud.

Backup 2: weekly backup to a secondary cloud provider, or to a local appliance, or to a different region in the same cloud provider. Crucially: a different failure domain from Backup 1. If your S3 region has a regional outage, Backup 2 should be unaffected.

For most SMBs, that means: production in AWS us-east-1, daily backups to S3 us-east-1, weekly backups to Backblaze B2 (different provider, different failure domain).

The cost of this architecture for an SMB is typically $20–80/month depending on data volume. The cost of not having it is "your business goes down indefinitely if AWS us-east-1 has a major incident."

What to actually back up

Most SMBs back up too much in some places and too little in others:

Should be backed up daily, retained 30+ days:

Customer database(s)
Transactional records (orders, invoices, payments)
Important business documents (contracts, financials)
Critical configuration (server configurations, environment variables)
User-uploaded content (if your SaaS allows uploads)

Should be backed up weekly or monthly:

Email archives (if not handled by Google/Microsoft directly)
Logs (for compliance, depending on industry)
Marketing and analytics data
Documentation and wikis

Don't need to be backed up (cloud provider handles it):

The operating system on managed cloud instances (re-provision from configuration)
Standard libraries and dependencies (re-install from package managers)
Pre-built artifacts (re-build from source)

Should NOT be backed up to your standard backup destination:

Encrypted secrets (handle separately with proper secret management)
Customer payment information (PCI compliance issues; let Stripe/Paddle handle this)
Personally identifiable information that's not necessary (GDPR data minimization)

The pattern: back up the data your business creates (irreplaceable), not the infrastructure that processes it (replaceable).

Database backup specifics

Databases are the highest-stakes backup category and the most commonly mis-configured.

Logical vs physical backups

Logical backup: a SQL dump of the data. pg_dump, mysqldump. Portable across versions, slower to restore for large databases.

Physical backup: a filesystem-level snapshot of the database files. Cloud providers' built-in backups (RDS automated backups, Cloud SQL automatic backups). Faster, but tied to the database version.

For most SMBs: do both. Daily physical backup via cloud provider's built-in tools. Weekly logical dump to S3 for long-term retention and version-portable recovery.

Point-in-time recovery (PITR)

If your database supports it (RDS, Cloud SQL, properly-configured Postgres), enable PITR. It lets you restore to any moment in the last X days, not just to nightly snapshots. This means a "we deleted the wrong record at 3:42pm" scenario can be recovered to 3:41pm, losing only seconds of data.

PITR adds 10–30% to your database storage cost but massively improves recovery for the most common type of incident.

Test restores

Every quarter, restore a recent backup to a test database and verify:

All tables present
Row counts match expectations
Critical queries return correct results
Application can connect to the restored database

Document how long the restore takes. This is your "Recovery Time Objective" — the time between disaster and being back online. If it's longer than your business can tolerate, you need a faster backup architecture.

Backup of SaaS data

Most SMBs forget that their SaaS data is also at risk. SaaS providers don't always back up your data the way you'd want — and even when they do, you can't always access those backups.

Specific gaps:

Google Workspace: Google retains data for 30 days after deletion. After that, it's gone permanently. If you need longer retention, use a third-party Google backup tool like SpinOne, Spanning, or Backupify. Costs $4–10/user/month.

Microsoft 365: similar 30-day default retention. Third-party backup tools like Veeam, Datto, AvePoint cover the gap.

HubSpot, Salesforce, Pipedrive: most CRMs let you export data via API or built-in export tools. Schedule weekly exports to your own storage. If your CRM goes down, you have your data.

Stripe, PayPal, Square: payment processors retain data extensively but you should still have your own copy of customer/order records. Use the API to export weekly.

Shopify, WooCommerce: built-in export tools exist; most stores never use them. Set up weekly automated exports.

The pattern: SaaS providers are usually reliable but they can have outages, security incidents, account suspensions, billing problems. Having your own copy of SaaS data is cheap insurance.

Common SMB backup mistakes

In rough order of frequency:

Mistake 1: backing up to a free cloud storage account. Free tiers expire or get throttled. Use a paid plan with a service-level commitment.

Mistake 2: backups in the same account as production. If your AWS account is compromised, attackers can delete both production AND backups. Cross-account backup is the right pattern.

Mistake 3: never testing restores. Covered extensively above.

Mistake 4: only backing up what was "in scope" three years ago. Business changes. New databases get added. New SaaS tools get adopted. The backup configuration from 2022 doesn't cover the systems added in 2024.

Mistake 5: backups are someone's individual responsibility. The person leaves. Backups stop. Solve via team-controlled accounts and documented procedures, not individual ownership.

Mistake 6: no retention policy. Backups accumulate forever. Storage costs creep up. Old backups that nobody would actually restore from sit costing money. Set retention policies (e.g., daily for 30 days, weekly for 12 months, monthly for 5 years), then enforce them.

When to hire help

Backup architecture and restore testing are mostly self-service for SMBs that take them seriously. When to bring in help:

You're regulated (HIPAA, PCI, SOC 2) and need backup architecture that meets specific compliance requirements.
You have complex multi-database, multi-region setups where the architecture needs design.
You're recovering from a backup failure and need help making it work.
You want a quarterly restore test done by an outsider so you have an audit trail.

The Lead Steer monthly retainer covers backup work as part of L3 ongoing support — quarterly restore tests are exactly the kind of work the retainer handles well.

What to do next

The companion articles cover other recurring L3 problems:

The Domain and DNS Disasters Nobody Plans For
The Integration That's "Almost Working"
Building an L3 Tech Stack: Monitoring and Tools — including backup monitoring

---

Part of the Level 3 Tech Support pillar guide.