Many USB File Copies for User

Description

Build a baseline of how many file copies each user does to USB media, and detect when a user copies an uncharacteristically large number of files.


Use Case

Insider Threat

Category

Insider Threat

Security Impact

Data exfiltration is top of mind for most security organizations. Copying data to USB is a top means for exfiltrating large and small volumes of data, so detecting that type of activity is key.

Alert Volume

High (?)

SPL Difficulty

Medium

Journey

Stage 3

MITRE ATT&CK Tactics

Exfiltration

MITRE ATT&CK Techniques

Exfiltration Over Physical Medium

Kill Chain Phases

Actions on Objectives

Data Sources

DLP
Endpoint Detection and Response

   How to Implement

This detection relies on visibility for files copied to USB. There are two frequent paths to building out that visibility. The first (and most common) is commercial DLP or EDR software that tracks these activities. The second, and cheaper, is that there is a little-known group policy option that instructs Windows itself to log these activities in the Windows Security Event Log. DV Provide This Link

   Known False Positives

This is a strictly behavioral search, so we define "false positive" slightly differently. Every time this fires, it will accurately a spike in the number we're monitoring... it's nearly impossible for the math to lie. But while there are really no "false positives" in a traditional sense, there is definitely the opportunity for lots of noise.

   How To Respond

When this alert fires, the immediate questions should be: what data was copied, and was it sensitive. As SOC staff rarely have very deep understanding of the sensitivity of the actual data (if you have data categorization fully implemented in your organization, kudos! You are the only organization in the world), it's usually most prudent to look at whether this user is in a position to have access to sensitive data. There are a few approaches to this process: guessing from titles / departments (e.g., research scientist versus customer service representative), pulling a list of privileged users (link), or others. Very mature organizations will also track authorized USB device (either via encryption or other means), where data copy to unauthorized / unencrypted devices carries different consequences than to devices that are a part of standard operating procedures. Finally, it's often realistic to make some determination of risk based on the file names copied, though it's important to not be overly reliant on that indication (if I were to exfiltrate data, I would likely create a zip file called expense_report_receipts.zip or something mundane).

Ultimately, with many behavioral detections, you probably don't want to evaluate this alert in a vacuum, as you would end up with an excessive number of false positives. Sure, alert directly if this alert occurs for users who have access to sensitive data, but generally speaking you should combine this alert with other suspicious indications such as HR issues, recent (or future) separation, etc.

   Help

Many USB File Copies for User Help

This example leverages the Detect Spikes (standard deviation) search assistant. Our dataset is an anonymized data collection from an actual customer environment.

SPL for Many USB File Copies for User

Demo Data

First we pull in our demo dataset.
Bucket (aliased to bin) allows us to group events based on _time, effectively flattening the actual _time value to the same day.
Finally, we can count and aggregate per user, per day.
calculate the mean, standard deviation and most recent value
calculate the bounds as a multiple of the standard deviation

Live Data

First we pull in our Windows security log, filtering to removable storage events that were introduced in Windows 10, and specifically to file writes.
In our lab testing datasets, we found it prudent to filter to just removable hard disk, particularly excluding CD drives. We do this via the regex search command, which allows us to flexibly match the Object_Name.
Bucket (aliased to bin) allows us to group events based on _time, effectively flattening the actual _time value to the same day.
Finally, we can count and aggregate per user, per day.
calculate the mean, standard deviation and most recent value
calculate the bounds as a multiple of the standard deviation

Screenshot of Demo Data