Quantum Leap in Ransomware Tracking

Date:
Publish on 22/01/2020
Located at Inria research center in Rennes, Brittany, France, the High-Security Laboratory (LHS) is coming up with a game-changing approach to ransomware countermeasure. Instead of relying on a signature database or scanning the system for suspicious behavior, the novel tool monitors alteration of the data in search of illegitimate encryption attempts. Called Data Aware Defense (DAD), this yet-to-be-marketed technology is patented jointly by Inria and DGA, the French defense procurement agency.
Illustration ransomware
© Inria / photo C. Morel

Sneaking on people's computers one way or another, ransomware worms encrypt the victim's files before demanding money or whatnot in exchange for the deciphering key. First appeared about five years ago, this type of threat is fast becoming a nuisance of epic proportion. “What really triggered the epidemic is the advent of the Bitcoin which gave scammers the possibility of collecting ransom in a completely anonymous way without running the risk of getting caught, ” says scientist Jean-Louis Lanet, Head of LHS, the High-Security Laboratory funded by Inria, CentraleSupelec, CNRS, the Brittany Region and the French defense procurement agency.
“Antiviruses work fine against ransomware as long as they know what they are supposed to be looking for. In others words, as long they have an up-to-date database of malware signatures. But if they hit upon a ransomware heretofore unheard of, sure enough they won't be of any use. ” Sensing the rumblings of the phenomenon back in 2014, LHS scientists thus decided to opt for an out-of-the-box approach.
“We are not interested in the ransomware's structure whatsoever, nor do we scrutinize its impact on the system. We solely focus on data alteration. If you have stored 10,000 jpegs on your hard drive, we'll keep an eye on each of them, or more precisely, we'll start monitoring them at the very moment those files are being edited. We don't look at what they used to be, but we have a model of what they should look like after the edition. We have such mathematical models based on a Markov chain for every object in the system. If the data starts morphing into something that diverges from the model, then something suspicious is at play. ”

Minimal Impact on System Performance

Using Markov chains comes with an additional advantage : “This mathematical model is very light. Therefore, real-time monitoring will have but very minimal impact on system performance, which is important because otherwise people won't be enclined to use the tool. We have also done a huge work in order to get rid of false alarms, actually achieving zero false positives. For instance, our model is accurate enough to ascertain that a jpeg alteration shouldn't be chalked up to an encryption but to a more innocuous image rotation. ”
Once an apparently more sinister divergence has been noticed, the second part of this two-tier solution comes into play. “We run a chi-squared statistical test on the transformed data. It provides metrics for the data distribution within a given file. If, all of a sudden, we spot a pixel that is completely different from the previous one, there might be something fishy about it. ”  In order to ascertain the predictability of the data values, the scientists considered “several statistical estimators such as the Shannon entropy or the Kolmogorov test. After a long study, we concluded that the chi-squared test was actually best suited for this job. ”
In practical terms, “as soon as the alarm is raised, we back up all the opened files whereas other solutions would back up the whole disk. ”  In some cases, a doubt might arise as to whether the ongoing encryption is the result of a deliberate action by the user themselves who whishes to encrypt some of their data. “Should this happen, a message will pop up on their screen asking if this a legitimate encryption. In other cases, the tool will take the decision to kill the thread and restore all the data. ”

WannaCry Sniffed Out Right Off the Bat

Being completely code-agnostic and focusing only on data alteration, the DAD countermeasure comes up with a nonpareil advantage: “We can sniff out new comers right off the bat. No need to wait for its signature to be added into an antivirus database. When WannaCry first showed up, we hadn't even had heard about it, yet it was immediately quarantined by our tool. ”
Meanwhile, the scientists came up with another idea. “For the time being, our model of the data use is generic. But why not move toward a self-adaptive model for each user? Afterall, in a company, a secretary and an engineer do not handle the same type of data. The former will use word processors and spreadsheets whereas the latter will go for more heterogeneous software such  as Python IDEs, so on and so forth. Therefore, by adapting to each and every user, our model would be much richer. That's what we're working on right now. ”
This 4-year research effort was conducted through the PhD work of Aurélien Palisse whose thesis is funded by DGA-MI, the cybersecurity branch of the French defense procurement agency. “We were also helped by two DGA-MI research engineers: Colas Le Guernic who co-directs this thesis and David Lubicz who came up with the idea of exploiting Markov chains. The next step for us is to deploy our solution over a large-scale company network in order to acquire more experience in the handling of remote maintenance, so on and so forth. In the meantime, we will start further developments to transform what is still a research prototype into a commercial solution that will then be marketed on a per-seat licence basis. ”  Lastly, Lanet concludes, “it is worth noting that out of the very first batch of DGA-funded PhD works at the LHS, one is going to lead to a real-world software. ”