Cloudflare Reveals Key Technical Causes of Massive Global Outage

Cloudflare released a detailed report explaining the cause of a major network outage that disrupted global internet traffic for several hours. Millions of users and services were affected.

The outage began at 11:20 UTC and was caused by an internal configuration mistake—not a cyberattack—showing that even strong cloud systems can fail.

This event follows similar outages at Azure and AWS, raising concerns about how dependent the world has become on large cloud providers.

Cloudflare’s issue started with a routine permissions update in its ClickHouse database cluster. At 11:05 UTC, the change exposed table metadata in the ‘r0’ database. A Bot Management query didn’t handle this correctly, pulling duplicate columns and creating a feature file twice the normal size.

This file, updated every five minutes to support machine-learning bot detection, exceeded the software’s 200-feature limit. That caused failures in Cloudflare’s core proxy system, FL.

At first, engineers suspected a massive DDoS attack, especially since Cloudflare’s status page was also down. The problem was harder to trace because good and bad files appeared in an alternating pattern during the rollout.

When the Bot Management module failed, request scoring stopped completely. In Cloudflare’s newer FL2 proxy, this resulted in 5xx HTTP errors. Older FL versions defaulted bot scores to zero, which could block real users on sites using strict bot rules.

The outage hit key Cloudflare services. Many websites showed error pages, latency increased, and debugging became difficult. Turnstile CAPTCHA stopped working, blocking logins. Workers KV also had higher error rates, affecting dashboard access and Cloudflare Access authentication.

Email Security briefly lost some spam detection, and configuration updates slowed, though no customer data was compromised. Cloudflare restored full service by 17:06 UTC after stopping the bad files, rolling back to a stable version, and restarting proxies.

Cloudflare’s CEO, Matthew Prince, apologized and called this the company’s worst traffic outage since 2019.

This incident also reflects a broader pattern of configuration-related failures across major cloud providers.

On October 29, 2025, Azure went down globally due to a faulty change in its Front Door CDN, disrupting Microsoft 365, Teams, Xbox, and even airline systems.
AWS suffered a 15-hour outage on October 20 in US-East-1 caused by DNS issues in DynamoDB, which affected EC2, S3, Snapchat, and Roblox.
Another AWS issue on November 5 slowed Amazon.com checkouts during holiday preparation.

Experts warn these outages show how dangerous it is to rely heavily on centralized cloud services—one mistake can impact the entire internet.

To avoid future problems, Cloudflare is improving its file ingestion process, adding global kill switches, reducing excessive error logging, and reviewing proxy failure behavior.

Although this incident wasn’t caused by an attack, it highlights the need for stronger operational controls as cloud systems continue to grow.

Post Views: 81

About the Author: FHN

FirstHackersNews- Identifies Security

About the Author: FHN

Windows Graphics Vulnerability Opens the Door to System Hijack with a Single Image

Investigation Underway: Microsoft Copilot File Processing Concern

WhatsApp Screen-Sharing Scam Exposes Users to Data Theft

Attackers Can Exploit Multiple Flaws in Cisco Unified CCX to Run Commands

Cisco Catalyst Center Bug Lets Attackers Gain Higher Access

Leave A Comment Cancel reply

Newsletter

Get Social

Categories

Recent posts

Cloudflare Reveals Key Technical Causes of Massive Global Outage