“Incident management as we know it is broken.”
That’s the conclusion I came to when I took over Zepko’s SOC team in 2015. We had a SOC filled with some of the brightest minds in the country, and a technology portfolio that would make Gartner “top rights” shake in their boots. Alerts rolled in through our wallboards, analysts diligently followed their pre-defined process, we caught attacks, we kept business safe, and we were winning. Over the past 3 years as a specialist, I had fought off nation state attacks from allied and enemy nations, worked on military research projects, secured FTSE 100’s to small mom-and-pop business, and scuppered millions of script kiddies across the globe. Our problem wasn’t security, it was process. The problems showed with the team, when an analyst were closing the 4024th repeat false positive alarm, or spent days chasing an attacker who’s location changed with the wind, the cracks began to show.
We have all heard the anecdote about “screening fatigue”, where an airport security guard subjected to a prolonged period of mundane, unthreatening bags becomes desensitized to the point where they are left unable to identify explosives and perform the role they were hired for. Most security operation centers have a similar problem, whether they are talking about it or not. Analysts were being used to perform the same tasks, over and over without authority to change process until they became numb to them. It was time to change. This is the tale of how we shook up our SOC, and how you can shake up yours.
The first thing we changed was realizing that security is not done on a per-client, per-site, per-system, per-team basis. Security is a global concept. As your reading this, a team in China is reversing the latest Windows patch, a teenager in Kansas has just got to grips with how to throw a Nessus scan at your netblock, and a government agent in Iran has just found another way to backdoor your router. The landscape changes daily and that’s why most of us got into this industry, to keep sharp. So why don’t our appliances and our teams? To address this Zepko put Threat Intelligence at the heart of everything we do, or, to be more accurate, we eat, sleep and breathe it so that our clients don’t have to.
As SOC manager I took the analyst team and cut their role in half. 50 percent of their time to be spent on securing companies, analytics and response. With the other 50% for research, and what they researched was largely up to them; whether it was the newest malware sample, organized crime unit, windows exploit. If it was “InfoSec” and new, it was fair game. They were also allowed to work on themselves, some became certified pen-testers, one took up DevOps, some became vendor tech experts, but we got better, and most importantly we got better together.
Alongside this, we consumed every piece of intelligence we could get our hands on. We consumed every OSSINT feed, we scraped every open IRC channel we could find that was even vaguely related to hacking until they kicked our bots, we set up honey pots on 7 continents, we ran open proxies day and night and a whole host of other things I can talk about another day. At one point, we were even swapping threat intelligence “off the record” in train stations in the middle of the night. If there was a new actor, a new technique, or a blip on the radar – we wanted to know about it. We were hungry.
This was the first change, instead of the team looking at the attack as an “internal” or “external” threat, a “false positive” or a “true positive”. There was suddenly context. Ticket closure comments went from “Closed because regular internet scanning” to “Incident caused by Miria Scanning from Chinese Actor <X>, actor discovered on Tuesday 4th January”.
Around this time we put all our chips into STIX and TAXII, unifying the landscape was a bold promise, but these projects gave us hope. Our incidents were expressed in STIX format, our intel was converted to STIX Indicator format, we offered up one of the first TAXII services to our UK customers. This was at a time where during intel sales pitches I still had to explain what STIX and TAXII were (unfortunately, I still have to today).
Kill Chains & Diamond Model
The single biggest “new-world” vs “old-world” security change occurred when we binned static runbooks, and adopted these two processes at the heart of incident response. To this day, the first thing that an analyst is taught on day one in the Zepko SOC are the two following PDF’s:
For those of you who don’t like academic white-papers, the good folks over at ThreatConnect have an amazing video on these techniques: https://www.youtube.com/watch?v=0a7xzJcFDIk
These methodologies force analysts to think about the attacker, rather than just the incident at hand. Carrying out full investigations no longer meant “False positive” vs “True Positive”, the analysts had to think about who the attacker was, what their intention was, how did they reach this point, what else and who else are they attacking, and more importantly, is there a way I can block this specific actor from hitting us again.
I could write ten articles on how every analyst should know these inside out, but I will leave you to read the above papers if you’re still in the dark.
Now we had an intel-driven, responsive and landscape aware SOC, but our tools were out of sync. I saw complex Maltego diamond models condensed down to comments in an incident ticket, or uploaded as a file attachment. Our tooling had to change with our new work flow. Our current in-house system was called ODS:Desk, it allowed analysts to work through incidents, runbooks, and track everything in dashboards. But intelligence searches had to be done in a separate system, incident correlation had to be down with copy and paste into Maltego or spreadsheets. Incident response still involved finding an analyst who knew that vendor technology and issuing a manual command. Enter I3.
I3 was our new scratch built IRM platform. It proved to be the missing piece of the puzzle. Built from the ground up as a multi-tenant Platform-As-A-Solution for running security operations across multiple clients, sites, and technologies. Here are just a few of the features that revolutionized our ability to fight cyber threats:
I3 – Runbook Creator
I3’s runbook creator allowed analysts to create custom workflows on a per use-case basis, allowing for if-then-else style logic processing. Actions could be locked to require manager or client approval, mundane actions such as GeoIP lookups could be automated and performed instantly before a human analyst could even register that a new incident had been created.
I3 – Threat Intelligence
Threat intelligence was embedded directly into the platform in the STIX format, allowing analysts to perform pivots on any piece of information they deemed to be interesting without the need to look up intel in a separate platform. Our extensive threat intelligence database was now at the analyst’s finger tips whenever they needed it.
I3 – Security Automation
I3 was one of the first (if not the first) platform to ship with OpenC2 security automation commands, this removed the mundane analyst jobs of enrichment using third party services like GeoIP, Whois etc. This also sped up our response to critical incident to sub 60 second response times by actioning things like firewall blocks, AV scans and isolations automatically for the analyst, without the need for them to remember vendor commands or log into a siloed system once they had got to the root cause of an incident.
I3 – Incident Explorer
Probably the feature we are most proud of is the Incident Explorer, this view would show the analyst the incident they were currently working on and its relation to every other incident in the platform. By showing similar incidents, or incidents related to effected hosts allowed the analyst to quickly fill in gaps in the kill chain and see the “wider picture”.
I3 – SLA Tracking & Team Stats
As a SOC manager, SLA’s are the one thing I will not budge on. Service level agreements ensure my team is responding to incidents in the speed that clients are paying them to. In security these are even more important than in uptime/outage monitoring. I have seen attackers get in and out of a network in less than 60 seconds, and I have seen ransomware cripple a business in less than 15 minutes. It is vital that analysts not only begin investigating an issue in a timely manner, but they actually do something about it in a timely way as well. Once an incident is opened in I3, the SLA clock begins, and will count down until work on the incident begins. Automatic actions don’t stop the clock.
I3 then produces real-time statistics on my team, including the amount of time they spend in the platform, any SLA breeches and their response and resolve time averages. We are completely transparent with these figures with our clients, so they can log in at any point and query how long we are taking in responding to threats in their estate.
It has been 3 months or so since we moved over to I3, and we are never looking back. By stripping away the mundane elements of the analyst role, their job is now filled with in-depth investigations into complex threats, communicating with clients, and sharpening their skillset, the stuff humans are really good at. We have a rule now that if you have to do something more than once, we should look at a way to automate it, and I stand by that as a good yard stick for how SOC’s all over the world should be operating.
So why am I sharing this with you? After months of development testing we have opened I3 out to the rest of the world. If you are sick of using Remedy or Service Now or (god-forbid) emails for managing your security incidents, you can now plug I3 into your security estate and get a tactical advantage over your adversaries.
As Zepko is a managed security company at its core, I3 comes with complete deployment and set-up assistance. This combined with the ability to fall back to our expert incident management team at any point if you can’t cope with the volume of incidents, or face a particularly tricky security issue means that we are poised to empower SOC’s all over the world.
We are now scheduling I3 demo’s for those of you willing to face the brave new world.
To join the webinar on Thursday 27th April sign up here -> I3 Webinar – The Future Of Incident Response