AI in Cyber Security is Dead...Long Live AI in Cyber Security (Part 2)
May 12, 2020
In Part 1 of this blog post, we discussed the evolution of AI, AI hype, and the Gartner hyper cycle and AI's position in it, among other topics. In Part 2, we'll extend that discussion to include a number of additional AI topics, culminating in 4 questions we recommend you ask your AI vendor before deploying their technology.
Real world, practical examples of AI today
Well, it turns out there are some real examples of the practical use of machine learning in cybersecurity, and they've been around for many, many years. The first one, of course, is spam management. We have multiple machine learning algorithms that have been used by different email providers for a long time. G-mail has been using this for 15 years to almost successfully eliminate spam from inboxes. A second example is the use of machine learning with antivirus companies and I'm not referring to next gen AVs like we hear of recently, but to do malware sample triaging in their backend systems. They've been doing this for more than 25 years. And the last example of successful use of ML is not directly in cybersecurity, but it's in a related field: fraudulent transaction detection. These technologies actually use deep learning and they've been used by banks and payment processors for more than 20 years now. These are clear examples that demonstrate that it's possible that machine learning can reach the plateau of productivity, but we tend to forget about these.
The AI Effect and Tesler’s Theorem
There’s actually a term for this: it's called the A.I. Effect, more precisely known as Tesler’s Theorem. Larry Tesler is a famous computer scientist. He worked at Xerox PARC for the famous research center back in the day, and he said something very simple about A.I., but also very real. He said intelligence is whatever machines haven't done yet. And there's definitely a human bias where as soon as machine learning matures to usefulness in the peoples’ minds, it stops being A.I.; it stops being intelligence. It’s a bit like here at Delve. We like to use the term “Wizardware” for anything that is machine learning-related because it confers on it the magical aspect that we think it truly deserves. So with all these new machine learning techniques, there are some elements that are, in my opinion, fundamental in reaching these productivity plateaus.
Trusting AI: explainability and interpretability
One of the most important things that needs to be recognized is the issue of trust. That is, with anything that uses ML, and to build that trust. There are two core elements that need to exist, and those two elements are explainability and interpretability in the sense that machine learning systems and models are supposed to be able to explain what decisions they made and for what reason. Humans need to be able to interpret these decisions and understand how the system sees the world. .These two elements are really core because if you're facing a complete blackbox, you can't challenge it. You can't correct it. It becomes a kind of oracle. You have to act blindly on the decision output. And it's also important to consider that we're humans. We need to get a feeling of how the system works, and we're very much visual creatures. And visual renderings are usually a good way to do so.
It would have been very hard for self-driving car manufacturers to actually earn the trust that the cars wouldn't kill their passengers if they didn't have public displays of the image that I have here on the upper right corner that shows clearly how the car sees the world.
It makes it evident that this system is actually able to understand the world around it, and that it won’t kill you in the first ditch and it comes across. So there are tons of techniques that exist to build and expand trust. The first most obvious one, of course, is building custom data visualization. There are better advanced techniques. And we're going to talk a bit about these.
The first one that I want to talk about is saliency maps. Saliency maps like we see on the right are a way that we have to give a good sense of which pixels in image processing have more impact on a machine learning system than others.
When machine learning interprets this, a very well known and good method called Activation Atlases that was proposed by Opening Eye helps interpret how some neural nets analyze images internally. And these are very useful because you can see the image here in the center, and it shows an image of what a machine learning system “sees internally” when it tries to tell the difference between a wok and a frying pan. And you can see these subtle images that the network has considered, and you can actually see that in the upper right corner, there are noodles that are associated indirectly with recognition of a wok and a frying pan. So, of course, it makes this association that doesn't make any sense, but it's probably because this neural net was trained on a specific set of data and label images that had woks with noodles in them. So it converted this association internally. And obviously this technique will help show the areas where there are some potential improvements that can be made against bad classifications.
Kasparov and Lee Sedol versus AI
Now, let's go back for a few seconds to our original machine learning examples, good examples on how trust and understandability can have a real life impact. So let's go back to the Deep Blue versus Kasparov and AlphaGo versus Lee Sedol examples. During the second game, I talked about Move 36. Deep Blue made a move that destabilized Kasparov. In a sense, it broke the trust that Kasparov had in the system. Up to that point, if you look at the games, Kasparov was easily leading the system in making bad decisions, mostly because these algorithms back then were pretty greedy. So he could decide where to lead the algorithm to go. But for a move 36, that system did not follow Kasparov’s lead at all. It played a very smart move and that confused Kasparov.
He lost the game by making mistakes afterwards. But at the time, he really felt that IBM had some professional chess player that intervened; he thought IBM had cheated. He was convinced that the machine couldn't have come up with this move on its own. And of course, IBM was very smart, and they played the blackbox card. They didn't release any logs of the machine just to further destabilize Kasparov. And, potentially they discussed it in the tournament. We learned recently in 2012, Nate Silver, the famous statistician, published a book and he interviewed the original Deep Blue coders. It turns out this was a bug. The machine just picked a move at random out of possible interesting moves. That's it. So, in my opinion, this is a good example of how the lack of trust can create problems down the line, especially with a human element.
For AlphaGo, there is also another interesting situation that relates more to the understandability of the machine. In Game 2, AlphaGo made a very strange move, the move thirty seven. And analysts there were baffled. They called it a slack or novice move and Lee Sedol was incredibly surprised by this move as well. It turns out this one was incredibly intelligent. Some now call it beautiful. But at the time, it was mystifying. And by playing this move, what AlphaGo did is actually create a network of stones that ended up giving a huge advantage to AlphaGo. There is a documentary on this, by the way, called AlphaGo that was filmed during the matches. And you can actually see the monitoring screens of the Google employees that supervised AlphaGo, and the estimated probability of winning after playing this move significantly jumps higher. But everyone in the room is baffled and confused. Why was this move played? And you know what's important? The takeaway here is that no one understood. Actually, this specific move helped the Go community devise new strategies. It opened up a new era of Go playing with different optimization strategies for victory. It changed the way humans see the game and strategize. And now people have more techniques that were given to them by this machine. But, it didn’ change our inability to beat machines at it.
AI and Gartner’s Productivity Plateau
But what's clear from all these examples and what I want you to take away from this part, too, is that there is a way to reach a productivity plateau for using machine learning in cybersecurity. But I think this productivity plateau will be reached first by imagineering systems that are actually able to collaborate constructively and supplement human capabilities. And this brings me to my last part of the presentation: the four questions you should ask your AI cybersecurity vendor. These four questions, of course, are to help you level the playing field, and to have a more rational approach about machine learning and cybersecurity. But before we begin, let's do a quick recap about the role of machine learning and its differences with traditional computing.
Machine Learning Primer
The objective of machine learning is to learn some facts about a population of data. And to learn these facts, it has to have been trained on a sample of data of this population first. So compared with traditional computing it is here on the left where you have an algorithm that processes data to produce an output and must have a specific way to process every different type of data that it's fed in a predictable way.
The machine learning approach on the right here starts by using a specific machine learning algorithm on a subset of data to train a model as an end result. This model in the end will be used on new data, so potentially much more data than in the training set to generate facts as an output. And these facts will then potentially be the base of actions that you're going to take that you will do in your product. So it's clear from this construction that the quality of data is paramount to producing a model that has some usefulness.
How do you justify the use of machine learning?
Constructing a model with the right set of algorithms is also very important because a wrong model will produce wrong facts about the newly analyzed data. And this is what is reflected in my four questions, beginning with question number one. How do you justify the use of machine learning in their first place? So that's the first question that should be asked. Why is using machine learning in that context the good option versus just using a simple heuristic approach. And if machine learning is the right approach, how is your model linked to your business objectives? Generally speaking, if you can't justify the use of machine learning, then just don't use machine learning. If you're trying to optimize a process, automate some behavior, handle some decisions on large data sets for which having an algorithm that can handle all use cases is possible, then maybe machine learning is the right choice. But then, there are some other use cases where you might not be and where a simpler series of if-then-else statements could do the trick even better. So, once you're satisfied with the fact that you need machine learning, you've got to have data.
How do you justify the soundness of your machine learning data set?
And this leads to my question number two. So this question has to do with the quality of the data set on which the model is trained. Can you justify the soundness of this data set? If you remember my introduction slides from this section, you are going to create a model by learning on initial data sets. How well does your training sample actually represent the underlying phenomenon that you're looking to analyze? You want to solve a specific problem? How well does this data set represent something that can be learned from this problem? And how do you make sure that distribution of this sample is actually well-balanced, that learning on it will be applicable to the rest of the population? And this is exactly the problem that plagued Watson. It was trained on synthetic cancer data then used on real patients. Guess what? It didn't work so well.
So considering the environment in which we've trained the model versus the one in which we're trying to solve the problem by applying this model is crucial. And, this can even create some security side effects. Let's say you built a model by training on data with very imbalanced sourcing. One source, for instance, is one client that has a very large presence in this training data. Well, you're going to play this model that has been trained there on other clients potentially, and this can potentially leak information to other users since they'll see the end results and the facts that are being skewed by the initial model training. You can infer some info about the original data set.
Avoiding model feedback loops and the echo chamber
And a special case of this is actually my next question. So your model and the fact that you're learning population will eventually have some consequences. You will learn some facts and potentially you'll change some user behavior. So this in turn will create some new data points in your data set. And these will be learned upon. So how do you avoid creating an echo chamber? And how are these data points accounted for in future predictions? So you can think of this just like a famous TED talk that talks about filter bubbles. For instance, on Facebook and Google, they end up showing you more of the same information that you're typically liking. This keeps you from potentially having access to a well distributed data set of information, and maybe creates a view of the world that is from a skewed perspective. This is exactly the same with a training model. It gradually skews on its data points. And it will end up learning wrong facts about the population. So avoiding feedback loops in the model is paramount.
Can you avoid domain shifts in your training data?
And this is somewhat related to the next question. That is question number three. Can you avoid domain shifts on your training data? Over time, there can be some distance that is created between the training data and the predicted upon data. That's domain shift. And of course, this will lead the model to the wrong conclusions. So it's critical that you make sure that this potential distance is accounted for. And, even if this domain shift can happen gradually, it can also happen with a sudden peak of bad training data, dataset pollution. So let's say you have a model that learns on existing client data and then you sign a much larger client and you integrate their data into a model. Well, suddenly, you have a peak of bad training data that would potentially pollute your model and therefore its conclusions. So this needs to be managed. And thus this actually was my fourth question.
A better understanding of machine learning
With all these four questions (and their sub-questions), I hope that, you're going to be able to approach security vendors with a much better sense of what machine learning can bring with it, how it can be used, and you'll be in a position to have a much more challenging discussion and avoid the fairy dust approach to machine learning and cybersecurity. I hope I was able to clarify a bit the importance of machine learning and their models and how the data is important and how the model is important.
And I know that if they use machine learning it is a bit better understood. It's going to be much easier to rationalize. So to summarize this presentation, at the end, in my opinion, machine learning is inevitable. It's not going to be a silver bullet. And cybersecurity has to find specific use cases for it. It probably won't be sexy, but it's necessary. There's just not enough people to do the work. And there is clearly a need for machine learning in cybersecurity. But we need less hype. We need a more rational and collaborative approach with experts in order to build trust and understandability, with models that use expert feedback to improve. And of course, you have to keep the vendors accountable for what they say.