Increase in Pages Printed

Description

Find users who printed more pages than normal.


Use Case

Insider Threat

Category

Data Exfiltration

Security Impact

It may seem inefficient and old-fashioned, but users that suddenly start printing a lot more pages from networked printers than is “normal” could be a sign of data exfiltration. Sensitive data could be leaving your corporation, literally in black-and-white! It is particularly interesting to correlate this behavior to a watchlist which may contain the user IDs of personnel that are considered higher risk: contractors, new employees, employees that never go on vacation, employees with access to particularly sensitive data. Often, the data gathered by Splunk can include the destination printer(s), the source of the print jobs, the names of files printed, and even whether or not the output was black-and-white or color.

Alert Volume

Medium (?)

SPL Difficulty

Hard

Journey

Stage 1

MITRE ATT&CK Tactics

Exfiltration

MITRE ATT&CK Techniques

Exfiltration Over Physical Medium

Kill Chain Phases

Actions on Objectives

Data Sources

User Activity Audit

   How to Implement

Implementation of this example (or any of the Time Series Spike / Standard Deviation examples) is generally pretty simple.

  • Validate that you have the right data onboarded, and that the fields you want to monitor are properly extracted. If the base search you see in the box below returns results.
  • Save the search to run over a long period of time (recommended: at least 30 days).

For most environments, these searches can be run once a day, often overnight, without worrying too much about a slow search. If you wish to run this search more frequently, or if this search is too slow for your environment, we recommend using a summary index that first aggregates the data. We will have documentation for this process shortly, but for now you can look at Summary Indexing descriptions such as here and here.

   Known False Positives

This is a strictly behavioral search, so we define "false positive" slightly differently. Every time this fires, it will accurately a spike in the number we're monitoring... it's nearly impossible for the math to lie. But while there are really no "false positives" in a traditional sense, there is definitely lots of noise.

How you handle these alerts depends on where you set the standard deviation. If you set a low standard deviation (2 or 3), you are likely to get a lot of events that are useful only for contextual information. If you set a high standard deviation (6 or 10), the amount of noise can be reduced enough to send an alert directly to analysts.

   How To Respond

When this search returns values, initiate your incident response process and validate the user account running these print jobs. If possible, determine which printer and what files are being printed and the time frame during which the printing occurred. Contact the user to determine if it is authorized, and document if it is authorized and by whom. If not, the user credentials may have been used by another party and additional investigation is warranted as excessive page prints might be a way to exfiltrate sensitive data.

   Help

Increase in Pages Printed Help

This example leverages the Detect Spikes (standard deviation) search assistant. Our demo dataset is an anonymized collection of Print Server logs from a Uniflow print server. For this analysis, we are tracking the total number of pages the user has printed per day 'sum(NumPages) by User _time'. Then we calculate the average, standard deviation, and the most recent value, and filter out any users where the most recent is within the configurable number of standard deviations from average.

SPL for Increase in Pages Printed

Demo Data

First we pull in our demo dataset.
Bucket (aliased to bin) allows us to group events based on _time, effectively flattening the actual _time value to the same day.
Finally, we can count and aggregate per user, per day.
calculate the mean, standard deviation and most recent value
calculate the bounds as a multiple of the standard deviation

Live Data

First we pull in our printer dataset.
Bucket (aliased to bin) allows us to group events based on _time, effectively flattening the actual _time value to the same day.
Finally, we can count and aggregate per user, per day.
calculate the mean, standard deviation and most recent value
calculate the bounds as a multiple of the standard deviation

Accelerated with Data Models

Here, tstats is pulling in one command a super-fast count per user, per day.
(self-explanatory)
calculate the mean, standard deviation and most recent value
calculate the bounds as a multiple of the standard deviation