Cloudflare: password error, outage

Cloudflare’s 1-hour outage, affecting services like R2 storage and Cache Reserve, was caused by a faulty credential rotation in the R2 Gateway service.

All about the outage

The outage occurred from 21:38 to 22:45 UTC, causing all R2 write operations to fail and 35% of read operations to be unsuccessful worldwide. No data was lost or corrupted, as successful uploads and changes were retained.

Cloudflare blamed human error for the issue, where new credentials were mistakenly deployed to a development instance instead of the production environment.

Impact on Services

The Cloudflare outage caused disruptions across multiple services:

R2: Write operations failed, and 35% of read operations were unsuccessful. Cached object reads reduced errors for customers accessing public assets via custom domains.
Billing: Customers had trouble accessing past invoices.
Cache Reserve: Increased requests to origins occurred due to failed R2 reads.
Email Security: Customer-facing metrics were not updated.
Images: Uploads failed, and image delivery dropped to 25%.
Key Transparency Auditor: All operations failed during the incident.
Log Delivery: Log processing was delayed by up to 70 minutes.
Stream: Uploads failed, and video segment delivery faced intermittent stalls.
Vectorize: Queries and operations on indexes were impacted, with all insert and upsert operations failing.

The issue was traced to the R2 engineering team omitting the –env parameter during the credential rotation, causing new credentials to be deployed to a non-production environment.