Keeping a customer application online during a Cloudflare outage
Demonstrates how Route 53 failover to CloudFront plus dual-AZ origins kept a SaaS application reachable, secure, and ingesting data when Cloudflare suffered a control-plane disruption.
Availability during outage
99.94% success
Latency change
+20 ms median
Data loss
0 (Kinesis/S3 intact)
Overview
The team operated browser and mobile traffic through Cloudflare while API ingestion landed in AWS us-east-1 using ALB, ECS, and API Gateway/Lambda.
Route 53 was configured with health-checked failover to an Amazon CloudFront distribution that reused the same ALB origin as an escape hatch.
Challenges
- Cloudflare control-plane and POP issues produced widespread 522/525 errors and elevated latency.
- Customer traffic began failing in affected regions within two minutes of the bad configuration rollout.
- Leadership needed assurance that ingestion pipelines and TLS posture would survive a dual-CDN failover.
Approach
Health-checked DNS failover
Route 53 monitored the Cloudflare hostname and automatically shifted traffic to CloudFront within about one minute of detecting the outage.
Capacity buffer for origin services
ALB targets across two AZs and ECS tasks scaled up 2x to absorb CloudFront cache misses while keeping
/healthendpoints green.Ingestion continuity and security parity
API Gateway and Lambda kept writing to Kinesis and S3 without interruption, while AWS WAF mirrored Cloudflare’s critical rules to maintain protection.
Impact delivered
- Route 53 failover preserved availability with only a brief DNS cache blip; customers could continue browsing and transacting.
- Median latency rose by roughly 20ms during CloudFront cache warm-up but stayed within SLO.
- No data loss occurred; ingestion pipelines continued operating and security controls remained enforced.
Key lessons
- Validate Route 53 failover paths regularly so TTLs and health checks behave during real incidents.
- Mirror essential WAF and rate-limiting rules between Cloudflare and CloudFront to preserve security posture.
- Pre-warm critical CloudFront caches and rehearse CDN/AZ failure game days to reduce surprise during outages.
Ready to transform your data infrastructure?
Let's discuss how we can help you achieve similar results with a tailored approach for your organization.
Get in touch