Two weeks back, we talked about Mean Time To Detect (MTTD), and how that's important to understand in order to make sure your cybersecurity architecture is working properly. Today I want to talk about a different question, what is MTTR - the Mean Time To Repair/Recover/Resolve/Restore take your pick, but it represents the average time it takes to repair a system once the failure is discovered. MTTR is critical in understanding how far a setback that a detected breach is, whether that is resolved immediately - or if it is so damaging it will take hours/days/weeks. "Recovered/Resolved/Restored" is important to qualify in this definition as the point where full functionality is reached, not "barely working", and there is definitely a difference between the two in the midst of a security event.
When we look at security breaches such as Sony, Colonial Pipeline, or SolarWinds, the MTTD might have been months, but the MTTR was also very long. There are stories about how Sony employees were reduced to having hand-written memos for weeks after the hack, and the company wasn't back to full operational functionality for months. One of the most dangerous parts of this is that a security organization that is already compromised and in the process of resolving the issue, is still very much at risk and vulnerable. A longer MTTR means a longer period of time between the event being detected, handled, and fully resolved.
And time very literally can cost millions of dollars an hour for some larger organizations. How much would it cost your organization to be completely unable to use computers for an hour? A day? A week? Calculate how much work you can't do during that period, calculate the extra work you'll have to do to get back to operational, and calculate all of that to see a full accounting of the initial cost of the breach. All of it is costly, and all of it is damaging to your business.
To determine Mean Time To Recover, which is likely what most businesses want to know, we would express the situation as a straightforward math equation:
Imagine what that would look like at your company, if you had a single system breached, a dozen systems breached, a full outage? London Security has been brought in to assist with major breaches in the past, and one of the things that you quickly learn is that every breach is slightly different, and doesn't always take the same amount of time. The number of affected computers doesn't always matter compared to the quality of the systems impacted / affected. If the CEO's laptop is breached, this might take a lot longer to recover from than your entire sales team having their computer's infected with ransomware. It might take less time. But in many situations it has taken weeks of verifying integrity of systems post breach before an organization is prepared to be "fully functional" again, and that means a rather large amount of time, money, and resources to resolve severe security threats.
The worst thing that can happen post breach, is creating a new problem in the process of resolving the old issue. Oftentimes, you will think a system is fully clean of an infection, then run multiple scans and realize it has tried to call back to command and control servers thousands of times over the past hour. Or you'll discover that your security solution will require additional oversight / reporting in order to maintain the level of security you now have as a baseline.
All of these indirectly will impact the recovery time, and increase the MTTR value.
Much like reducing MTTD, reducing your Mean Time To Recovery is a process that is not a "one and done" situation. Patience is necessary, as sometimes what seems like an easy fix can spiral off into hundreds of hours of work to resolve. The Log4j vulnerability is a great example of a problem (a detected threat) that has a rather large MTTR for organizations, as it requires patching, reporting, and verification that the patching has occurred across your entire network. The "threat" can't be considered resolved until your systems are once again protected from it as a potential vulnerability.
Reducing MTTR is much harder than reducing MTTD. Ideally you can test response times with detections, but with recovery you have to simulate numerous unknown variables in order to calculate / test a response. A good idea of how quickly your organization can handle a major breach or vulnerability is to look at how long it takes to verify a Microsoft patch deployment working properly, or looking at the recovery time involved with the aforementioned Log4j issues. Here's some other considerations to look at:
The final point on here is one that has led to London Security Solutions saving impacted customers hundreds of thousands of dollars, by being on call when a breach occurs, or acting as a managed service provider who handles the threat from first detection. We have repeatedly acted as the first response to security threats, and reduced the time it takes to handle security events by having a highly trained, skilled, and experienced SOC team that stands watch 24 hours a day, 7 days a week, 365 days a year. While they are assisted and backed by an AI and ML driven threat intelligence engine, the true threat hunting and response are performed by actual humans. Our analysts become experts on your unique environment so they know when there’s an anomaly and can investigate and correlate data to ensure your network remains secure and can act when action needs to be taken, or alert you when additional steps may be required.
If you’re ready to see how London Security’s MDR can help reduce your MTTD & MTTR and increase your overall security posture, start with a risk-free 15-day FREE Proof of Value. Don’t take our word for it - see for yourself. Fill out the form below to get started.
As a note - this is part 2 of a 4 part series. If you don't want to wait for all the blogs to get posted, use the form below and we'll go over it with you.