Vulnerability Management Blog

Removing the Mystery from AI: Easier Said Than Done

One of the challenges encountered by all software companies that have built AI or machine learning into their products is eliminating “black box” syndrome, or the perception that inputs are consumed by the AI black box, and magically, reliable results are spit out. The results are valid, useful, and can be relied upon to make critical business us. Some time ago, we at Delve set out to provide a window into our prioritization “wizardware” for our customers that was informative, understandable, and user-friendly. We discovered that it wasn’t as straightforward as we might have expected.

Even though it was only introduced a few months ago, I can't imagine Delve without our contextual prioritization graph anymore. There are many things to love about it. It's easy to understand, yet it accurately reflects a very complex process. It looks like an obvious and natural decision, and yet it was not our first attempt at explainability.

From the moment we began introducing contextualized vulnerability ranking in the product, we wanted to explain why things were the way they were. A number is not enough. To be trusted, the factors impacting the result had to be transparent. We began with a list of factors and threw in a few bar graphs. It was OK with a few factors, but it was never great. As we kept improving the engine and adding more prioritization factors, limitations became more apparent... for those who could venture deep enough in the application to even see them.

We wanted the explanations to take the central stage and not be hidden. I would love to say that we tasked our team with creating the dashboard widget shown in the figure here.

If that were the case, this would have been a much shorter story, but these drawings are actually quite far into the reflection.

Digging into this blog's history, you may find some posts by Serge-Olivier with early attempts at explaining the score calculation and how specific factors impact it. It was the topic of ongoing whiteboard discussions. Eventually the “flamethrower” surfaced. I believe it was a somewhat unrelated attempt to validate that our score distributions were working as expected. The idea was attractive from the start. However, the early renderings were not as clean and caused more questions than they answered.

At the time, calculated scores were unbounded. Looking back, it was obviously a problem. No one knew if 1300 was a high score. The truth is that it varied over time as our factors evolved. In every attempt to graph this flamethrower on a whiteboard, the only way it made sense was if the scales on both sides were the same. However, that did not make much sense mathematically, and wasn’t consistent with the way we calculated the aggregate scores.

This is about when the above sketch came up. We had a valid solution for the groupings on the horizontal scale, a solution for the too-large number of lines to render, and really a consensus between research, user experience, engineering, customer success and marketing - if only we could normalize those values. It was time to rip off the band-aid. Our unbounded score was wrong. It had to be out of 10. It meant changing much of the aggregation pipeline, giving up on the dream of commutative factors and patching up a nearly-infinite amount of user interface details.

We figured out the normalization, did the work, altered the pipeline, created a brand new graph component, clarified the product interactions iteratively and the rest is history. The graph looks obvious and natural to the data not because it was - but because we reframed the data to fit our goals. Our product today includes an interactive version of the contextual prioritization progress graph which provides our customers graphical and text-based detail on how the prioritization score for each vulnerability was calculated, and which factors impacted the score most prominently. Good bye black box. Hello AI transparency. When the goal is worthwhile, nothing should be kept off the table.

Most Recent Related Stories

Automating Threat Intel with Machine Learning: Extracting the Underlying Concepts from Underground Discussions and OSINT

Read More

Delve Product Update - June 2020

Read More

Assembling IKEA Furniture, Vulnerability Management, and Intelligent Prioritization

Read More