Today (Friday, July 19, 2024), many IT and business professionals woke up only to be greeted by a global calamity. A routine software update from CrowdStrike, a leading cybersecurity technology company, caused a massive, global technology outage for nearly anyone leveraging a combination of a Windows Operating System environment and CrowdStrike’s Falcon product. The update, intended to enhance the security of Windows computers, contained a defect that led to widespread system crashes. The issue was first noticed in Australia and quickly spread to other regions, including Asia, Europe, and the United States. The faulty update caused Windows computers with CrowdStrike’s Falcon Sensor installed to crash, displaying the infamous "blue screen of death."
Industries and Organizations Affected by CrowdStrike Outage
The outage has had a significant impact on various industries and organizations worldwide:
- Airports and Airlines: Major airlines such as American Airlines, Delta Air Lines, and United Airlines experienced severe delays and cancellations due to communication problems.
- Banks: Customers in the United States, Australia, New Zealand, and other regions reported issues accessing their accounts at major retail banks.
- Retail: McDonald’s in Japan closed some stores due to cash register malfunctions, and the British grocery chain Waitrose had to accept only cash payments.
- Law Enforcement: Agencies like the Alaska State Troopers reported issues, including temporary disruptions to 911 services.
- Healthcare: Hospitals and other healthcare facilities faced disruptions in their operations.
- Government Departments: Various government departments that rely on CrowdStrike’s cybersecurity software were also affected.
Fixes and Developments
CrowdStrike has acknowledged the issue and confirmed that it was not caused by a cyberattack, but rather due to the aforementioned defect in the software update for its Falcon Sensor. The company has identified, isolated, and deployed a fix for the issue. However, due to the complexity of the fix, some businesses and organizations may continue to experience outages into the weekend or next week. CrowdStrike’s CEO, George Kurtz, has apologized for the disruption and assured customers that the company is working diligently to resolve the issue.
CrowdStrike said it is working with “customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted.”
CrowdStrike has identified a Channel File in the update as the culprit for today's global IT outage. This file can be addressed individually, allowing users to retain the Falcon Sensor update. The company has provided workaround steps for affected systems.
For the most up-to-date information on this CrowdStrike outage or if you need help, SBS recommends you visit CrowdStrike’s support forum or follow posts such as this X thread for support. You can also reach out to your CrowdStrike Concierge Security Team (CST) for the best support.
That said, for recovery recommendations, first and foremost, start with a simple reboot of your machine or Azure environment. “We have received reports of successful recovery from some customers attempting multiple Virtual Machine restart operations on affected Virtual Machines,” Microsoft said. “We’ve received feedback from customers that several reboots (as many as 15 have been reported) may be required, but overall feedback is that reboots are an effective troubleshooting step at this stage.”
If rebooting/restarting your devices or environment does not work, here are some additional steps to recover from this CrowdStrike outage:
For servers/workstations affected by the update:
- Boot Windows into Safe Mode or the Windows Recovery Environment:
- Restart your computer and press the appropriate key (usually F8 or Shift+F8) to enter Safe Mode or the Windows Recovery Environment
- Navigate to the CrowdStrike Directory:
- Once in Safe Mode or the Windows Recovery Environment, navigate to the following directory:
- C:\Windows\System32\drivers\CrowdStrike
- Locate and Delete the Faulty File:
- In the CrowdStrike directory, locate the file matching C-00000291*.sys and delete it
- Boot the Host Normally:
- Restart your computer normally. If your system does not crash within a few minutes, the workaround is successful
For cloud environments, you can follow these additional steps:
For AWS (Amazon Web Services):
- Detach the EBS volume from the impacted EC2 instance
- Attach the EBS volume to a new EC2 instance
- Fix the CrowdStrike driver folder
- Detach the EBS volume from the new EC2 instance
- Attach the EBS volume back to the impacted EC2 instance
For Azure:
- Log in to the Azure console
- Go to Virtual Machines and select the affected VM
In the upper left of the console, click “Connect” - Click “More ways to Connect” and then select “Serial Console”
- Once SAC has loaded, type in cmd and press Enter
- Type ch -si 1 and press the space bar
- Enter Administrator credentials
- Type the following commands:
- bcdedit /set {current} safeboot minimal
- bcdedit /set {current} safeboot network
- Restart the VM
- To confirm the boot state, run the command:
- wmic COMPUTERSYSTEM GET BootupState
By following these steps, you should be able to apply the fix and recover from the CrowdStrike outage. If these steps do not work for you, there are many other available resources online for your particular environment that may be more helpful.
“We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels,” CrowdStrike said. “Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.”
Recovery and Prevention
Organizations affected by the outage should take the following steps to recover and prevent similar issues in the future:
- Apply the Fix: Ensure that the latest fix from CrowdStrike is applied to all affected systems.
- Backup Data: Regularly backup critical data to prevent data loss in case of future outages.
- Update Policies: Review and update IT policies to include procedures for handling software updates and potential outages.
- Communication Plan: Establish a clear communication plan to inform employees and customers about the status of the outage and recovery efforts.
- Monitor Systems: Continuously monitor systems for any signs of issues and address them promptly.
By taking these steps, organizations can minimize the impact of the current outage and be better prepared for any future incidents.
One last thing: Please be alert regarding social engineering attacks (phishing, vishing, smishing, or other impersonations) that will absolutely attempt to take advantage of the situation. Threat actors are actively targeting potentially impacted organizations by posing as IT support or possibly CrowdStrike and/or Microsoft.