Reliability and Your Security Model

We've heard the questions "What keeps you up at night?" and "What's your biggest concern regarding cybersecurity?"  While individually we have our own answers to these questions, many will vary, in the end the overwhelming aspect boils down to: reliability

For true reliability – let AI enhance, not replace your security staff

Over the years I've had IT managers tell me they fear existing security solutions won't stop attacks on their own, but they can't afford to hire or train additional resources.  

Why?  Probably because their executives believe everything they heard in the sale pitch and think all of their systems are automated, using AI, and therefore they have nothing to worry about.  Thus the reason most upper managers and executives tend to deny security headcounts, and cut security budgets...that is, until something happens.

Talk about opposite ends of the spectrum!  Most of us have the same fears of someone failing to respond to an alert or notification because we understand automation alone won't solve issues or actually stop an attack.   

Alerts are great……IF they generate a response

We have a customer whose executive team thinks the alerts are good enough, yet their security engineers have stated (and I quote) "I won't wake up if I get a text at 1am.  These alerts won't be seen until my alarm goes off and I have a cup of coffee." 

How's that for self-admitted failure?  So they adopted an AI solution for quicker responses, but which responses should be automated?  At what point does AI and those automated responses turn into a bigger problem, causing larger issues or worse...downtime?

A lesson from RoboCop

The movie Robocop (1987) has most likely one of the best failures of security automation in the movies.  There are others, but this one is an absolute classic.  

The Enforcement Droid, series 209 (ED-209) was designed to be a fully automated peacekeeping machine.  The units were programmed for urban pacification, but Omni Consumer Products (OCP) had also negotiated contracts with the military for use in war.  

Apparently the ED-209's weakness was its logic circuits.  It simply couldn't process information as quickly as a human brain, and the bad design prevented it from successfully maneuvering through an urban landscape.  

It also suffered from a manual override vulnerability which allowed an unarmed and somewhat skilled hacker to access its command system and take full control of the mech.  Oh man...talk about security and design deficiencies!

Key takeaway

As Allie Mellen from Forrester stated in her February 2nd blog:

"The core capabilities of human beings are AI’s blind spots; “humanness” is simply not yet (or possibly ever) replicable by artificial intelligence. We have yet to build an effective security tool that can operate without human intervention. The bottom line is this: Security tools cannot do what humans can do."

What your vendor tells you versus reality

In post mortems following recent attacks, I've heard people say things like "our vendor ensured us we were protected against this".  

Of course they did, that's their job, to get you to purchase their product.  They'll tell you what they want you to hear and even have an engineer show you a demo of their stuff doing just that.  

You'll never hear a sales person say anything about how their product or solution can't do something, or falls a little short in protecting against something.  Then after an incident, when you're working with support, they'll say you or your team weren't using it correctly, it wasn't configured properly, it wasn't patched, or whatever.  It'll never be their fault!  It's always something you did or didn't do.

So what should you do?

Back to reliability.  You have a security stack of solutions.  Some solutions in that stack are dependent on other solutions doing their job(s) without error.  We could spend hours walking the stack in from the cloud or out from your core, but that could potentially build a list of technologies and solutions you aren't budgeted for at this time, and I'm writing a blog - not a book 😉.  

(You also shouldn't take a minimalist approach towards any of those layers.  That might be a topic for a later blog.)

Don’t forget the human factor

Like you, I've attended webinars and presentations about "core to cloud" and "edge to endpoint"...but none of them want to talk about the human factor.  

These days it seems as if every vendor is only interested in telling us how their solutions have "evolved with AI" and can solve our security and staffing problems with their automated responses thanks to machine learning and architecturally integrated platforms (think XDR).  That's all great until one of the pieces has an issue...then it's a chain reaction of failures.  (remember the ED-209!)

Texas Freeze 2021 – the importance of Reliability  

Rather than bashing specific vendors, let’s look at another example of how reliability must become the talking point in security.  Consider the Texas power grid.  

As a Texas resident, I had questions about the February 2021 Freeze that led to a power fiasco.  I sat down with Texas State Senator Bob Hall, who explained what happened and how the cascade of failures nearly destroyed the Texas grid.  

He stated it wasn't that one specific point, segment, or sector that was to blame.  It started small and cascaded into a reaction of failures where stations needing to be brought online to provide power generation for other stations were themselves without power.  They couldn't get them started!  

It wasn't the frozen wind turbines, wasn't the underpowered solar arrays, wasn't any of the individual systems that caused the issues.  

It was how each one impacted and compounded their impact into the overall failure.  Because of the unreliable nature of the automated processes, it wasn't until humans got involved and stopped the automated processes. 

That bears repeating: the situation was not resolved until humans got involved to stop the automated processes. There's much more to it than this but I'm only using the cascading failures for my example.

Human intelligence saved the Texas power grid

People had to think two and three moves ahead of where the issues were, where they were going, and then ultimately set things in motion to accommodate some of the remaining automated measures while bringing systems back online.  

Therefore, it was a combined human-machine response that ended up keeping the grid from becoming a total loss.

What does this mean for your business?

The same approach should be taken with cybersecurity – you need a combined human and machine response to security threats.  

A fully automated SOC may sound great and look good on paper, but in the end, you're going to need an analyst to verify or validate.  As Allie's blog is so aptly titled - Stop Trying To Take Humans Out Of Security Operations.


ABOUT THE AUTHOR

Dennis London is the President of London Security Solutions.  His IT Security career spans 30 years with an extensive history of first hand experiences with malware, attacks, and breaches in organizations of all sizes.  Including being one of the first responders to assist Sony with their November 2014 attack.  Dennis built a team in London Security based on the needs of their customers.  Today London Security is an award winning and nationally recognized leader in providing managed security and MDR/SOC services.


Extra resources for your reading pleasure:

Amazon talks about reliability as one of the five pillars of their framework.  https://wa.aws.amazon.com/wat.pillar.reliability.en.html

In Systems Engineering, we're taught about Reliability, Availability, and Maintainability.  Here's a link to the Systems Engineering Book of Knowledge (SEBoK) for the definitions and explanations (https://www.sebokwiki.org/wiki/Reliability,_Availability,_and_Maintainability#Reliability)

I mentioned Allie's blog and linked to it above, but here it is again: https://go.forrester.com/blogs/stop-trying-to-take-humans-out-of-security-operations/