Defcon AIVillage 2022
I’m not Keylogging you! Just some benign data collection for User Behavior Modeling
User and Entity Behavior Analysis (UEBA) has been an active area of research in cybersecurity for years now. Advancements in unsupervised machine learning methodologies have made UEBA models effective in detecting anomalous drifts from baseline behavior. But when collecting user generated systems data from a cluster of machines in the cloud or from an endpoint, the data scientist gets access to human generated raw features, which keys are typed when, and what are those. This starts off as acceptable but wades into the grey area of almost key logging users which is dangerous.
In this talk, we will go through a real example of how a user behavior experiment was set up, right from building the features to running the data collection script within containers to flushing the raw data regularly and the users sending only aggregated metrics to the data scientists for model building and analysis. We’ll go through the entire setup from data collection and data flushing to model building by creating weak labels and further analysis.
Conference on Applied Machine Learning in Information Security (CAMLIS) 2021
Using Undocumented Hardware Performance Counters to
Detect Spectre-Style Attacks
In recent years, exploits like Spectre, Meltdown, Rowhammer, and Return Oriented Programming (ROP) have been detected using Hardware Performance Counters. But to date, only relatively simple and well-understood counters have been used, representing just a tiny fraction of the information we can glean from the system. What’s worse, using only well-known counters as detectors for these attacks has a huge disadvantage – an attacker can easily bypass known counter-based detection techniques with minimal changes to existing sample exploit code. Uncovering the treasure trove of overlooked and undocumented counters is necessary if we are to both build defenses against these attacks and anticipate how an adversary could bypass our defenses.
In this paper, we’ll first introduce our version of Spectre variant 4 with evasive changes that can bypass any detections using conventional cache miss, branch miss, and branch misprediction counters. We’ll then show how our model using select undocumented counters is able to detect this new edited variant, and how it is also able to detect a novel Spectre implementation submitted to Virus Total.
PyData Boston 2020
Topic modeling is a very useful NLP technique to analyze and classify huge corpus of data. It helps us in clustering unstructured text data into meaningful groups. In cybersecurity, filtering huge data logs to fish out leaked credentials / passwords is a huge time consuming task for red teams. A red team consists of security professionals who act as adversaries to overcome cybersecurity issues. Red teams consist of ethical hackers who evaluate system security in an objective manner. In this talk, we will go through the basics of LDA (Latent Dirichlet Allocation) topic modeling. Then by using this technique on a real world example, we’ll go through a hacker’s system logs, and try to filter out useful data like pass codes and credentials which can be used for further security analysis.
Video (From 34:37)
BlackHat USA 2020
In recent years, exploits like speculative execution, Rowhammer, and Return Oriented Programming (ROP) were detected using hardware performance counters (HPCs). But to date, only relatively simple and well-understood counters have been used, representing just a tiny fraction of the information we can glean from the system. What’s worse, using only well-known counters as detectors for these attacks has a huge disadvantage – an attacker can easily bypass known counter-based detection techniques with minimal changes to existing sample exploit code.
If we want a viable future for exploit detection, we need to move beyond just scratching the surface of the HPC iceberg. Uncovering the treasure trove of overlooked and undocumented counters is necessary if we are to both build defenses against these attacks and anticipate how an adversary could bypass our defenses.
We’ll begin our journey in walking through our ML-based solution to more effective exploit detection. Using the entire corpus of performance counters for commonly used baseline programs and behaviorally-similar malicious programs, we zero in on the counters we want to use as features for our supervised classifiers. We will then interpret our model to determine how they can effectively detect various exploits using novel performance counters.
Finally, we’ll showcase the uncommon and previously ignored performance counters that were lurking in the dark, with so much useful information. The results seen here will emphasize the need for documenting these counters, which were highly significant in our models for attack detection.
Open Data Science Conference (ODSC) -East 2019
Machine learning is proving to be an important tool against cyber attacks, especially in finding zero day threats and in behavioral threat detection. Here, we will see how a couple of bugs that exploit critical vulnerabilities in modern computer processors, namely “Meltdown” and “Spectre” that were released in early 2018, took the cyber world by storm. These hardware vulnerabilities allow programs to steal data that is processed on the computer. We will see the Jupyter notebook that demonstrates the entire process of raw cpu data collection, data wrangling, machine learning experiments and final model selection to successfully detect the Spectre and Meltdown attacks when it is happening real time in a Linux system. The final machine learning model is the basis for the actual threat detection strategy that is engineered into the security product.