Microsoft: The World Runs on It, or Does It Run the World?

Understanding the Microsoft/CrowdStrike IT Outage

Newsletter Topics: July 19th Microsoft/CrowdStrike IT Outage

🗞️ In the News: You probably heard about the massive IT outage that caused thousands of flight cancellations and disrupted banking, healthcare services, and many other industries worldwide. So, what did all the affected organizations have in common? They use CrowdStrike to protect their Microsoft endpoints (servers, personal computers, and virtual machines that run on Windows Operating System).

People in IT are well aware of Microsoft's global dominance. However, it was potentially eye-opening for those outside the IT world how many organizations rely on Microsoft. Last week's events made it clear: Microsoft runs the world!

But wow, Crowdstrike has been quite active in expanding its presence too.

👀 Today’s Focus: The basics: What do you need to know?

  • Who? What do you need to know about the two key players?

    1. Microsoft: is known for its software products, including the Windows operating system and the Microsoft Office suite (Excel, Word, Internet Explorer). Every organization leverages multiple types of endpoints to manage its IT operations. These endpoints range from employees’ computers to physical and virtual servers, and many of these endpoints run on the Windows OS (Operating System).

    2. CrowdStrike: provides endpoint security, among other things. Organizations install CrowdStrike solutions on their endpoints to monitor for any abnormal behavior and protect against cyber threats, including malware and ransomware.

  • What? On Friday, July 19, 2024, at 04:09 UTC, CrowdStrike released a new update that included a configuration update for Windows OS. Any Windows endpoint running a specific version that was online between Friday, July 19, 2024, at 04:09 UTC, and Friday, July 19, 2024, at 05:27 UTC received the faulty CrowdStrike update. This new configuration update caused a Windows system crash, rendering the computer or server unusable with the blue screen of death.

  • Where? The faulty CrowdStrike update was deployed to Windows endpoints over the air (remotely). Organizations had no way to control when and where a CrowdStrike update would be deployed. This caused hundreds of thousands of endpoints to crash across the globe and caught all organizations by surprise.

  • Why? Why was there such a massive impact?

    1. Organizations Taken by Surprise - As mentioned before, this faulty update caught the organization by surprise. CrowdStrike currently does not provide greater control over the delivery of updates and does not allow organizations to choose when and where these updates are deployed. Adding this feature is one of the action items from their preliminary post-incident review.

    2. Manual Fix - To fix the issue, the faulty file needs to be deleted on each endpoint. However, because the Windows system crashes, IT engineers need to manually start each endpoint in "Safe Mode" and then delete the faulty file. This process becomes even more complicated if Full Disk Encryption is enabled. Some automation is possible in virtual servers and machines, but manual intervention is required on physical servers and employee’s computers.

    3. Complex Recovery - Once every Windows endpoint is manually brought back to life by an IT engineer, the recovery phase starts to ensure all systems are back up and running correctly. For industries like banking or airlines that process millions of data records per second, it means guaranteeing bad data is deleted, re-syncing databases, etc. This is a massive massive effort. The more automated the organization's operation the harder the recovery.

😎 Grandkid Opinion of the Day:  Lessons from this significant IT outage:

  1. Microsoft will need to redefine its partnerships with third-party vendors like CrowdStrike, making it easier for them to integrate with Microsoft products. Although the fault lies with CrowdStrike, the widespread blue screen of death on affected Windows endpoints reflected poorly on Microsoft. Are people going to remember this as a CrowdStrike global IT outage, or a Microsoft global IT outage?

  2. CrowdStrike will face many internal changes to prevent a similar situation, as well as numerous lawsuits. This week, Delta announced plans to sue both CrowdStrike and Microsoft for operational losses estimated at $500 million. The good news is that this incident offers a significant lesson for the software industry. While CrowdStrike was at fault this time, any major software vendor could have caused such an outage given the dependency on a few vendors. The industry will see major improvements as a result of this event.

  3. Organizations will need to reassess their reliance on single vendors for critical systems, creating opportunities for smaller players to enter. Companies must re-evaluate every IT recovery process and ensure they have robust contingency plans in place.

Share a story by replying to this email! - If you or someone you know has been hacked or compromised, share your story! We are always looking to raise awareness within our community!

🚀 Sponsor a Newsletter! Do you want to sponsor a newsletter? Reply to this email to contact us.

Got this email forwarded? Subscribe below.