CrowdStrike Outage: 5 Key Points to Strengthen Data Resilience in Your Organization

On July 19, 2024, an attempt by CrowdStrike to update the “Falcon Sensor” for real-time threat detection and endpoint protection led to a system crash that affected 8.5 million Microsoft Windows devices, causing widespread IT and operational disruptions worldwide. Although this incident was not caused by a cyberattack or malware, it underscores the importance of having a comprehensive and reliable backup and disaster recovery strategy in place to prevent disruptions to business operations.

CrowdStrike Causes Immediate Global Impact
The outage was first detected in Australia, where the “blue screen of death” spread across Windows devices across the world, significantly disrupting not only users, but also companies and critical service providers. Reports of disruptions emerged from various sectors, including finance, IT, manufacturing, and more. By the afternoon, approximately 2,600 flights in the U.S. were canceled, while over 4,200 flights were affected globally and had to resort to manual check-ins, according to the Wall Street Journal.

How long RTOs impact business operations
Following the incident, CrowdStrike provided technical support and released a patch to help restore system operations. However, many systems used by organizations were unable to be automatically recovered via a repair program. When that happens, IT admins have to manually boot every single affected device into safe mode and delete the problematic updates from CrowdStrike.

Though Microsoft introduced a “process-minimizing” solution within the next day, which helped automatically delete the faulty files, it was still a laborious process of manually booting individual devices into WinPE via a USB drive. Downtime leads to operations disruptions, loss of productivity, additional costs, increased compliance risks, and ultimately, a negative customer experience and tarnished corporate reputation.

Build a strong data protection plan to maintain business continuity at all times

  1. Comprehensive backups: Deploying a backup strategy that regularly covers all sources and devices without isolated data is crucial for businesses, especially those operating across multiple platforms or tools.
  2. Regular restoration drills: Equipment and system failures are never predictable. Continuously testing the recoverability of backup data is essential for verifying the effectiveness and availability of the organization’s disaster recovery plans.
  3. Instant VM recovery: Virtualizing services and restoring operations as quickly as possible ensures reduced downtime and business continuity.
  4. Cross-platform restoration: In CrowdStrike’s case, only one platform was affected. Businesses can minimize the risk of data loss by ensuring that all data, applications, and systems can be recovered and reinstated across multiple environments.
  5. Off-site backup and recovery: In addition to backing up on-site data, implementing an off-site backup mitigates risks associated with data loss. If a company had deployed an off-site cloud backup during CloudStrike’s event, it could have easily resumed services from the said off-site backup site.

Backups are the key to data resilience
Having a secure backup and disaster recovery plan is the key to data resilience and a crucial step for any business pursuing digital transformation. The CrowdStrike incident firmly highlights the importance of establishing a robust backup strategy and testing backups on a regular basis to maintain continuity in the face of unforeseen circumstances.

LEAVE A REPLY

Please enter your comment!
Please enter your name here