User Finding Project Code Names from Many Departments

Description

Find users trying to collect and analyze internal projects from across multiple departments by analyzing their search logs on company wiki software.


Use Case

Insider Threat

Category

Insider Threat

Alert Volume

Low (?)

SPL Difficulty

Medium

Journey

Stage 4

MITRE ATT&CK Tactics

Collection

MITRE ATT&CK Techniques

Data from Information Repositories

MITRE Threat Groups

APT28
Ke3chang

Kill Chain Phases

Actions on Objectives

Data Sources

Web Server

   How to Implement

There are two key components of implementing this particular detection. First, you need to make sure that you have the right Confluence logs -- run the base search, or look for confluence logs in your environment by searching for something like dositesearch (that's how I found them!) and record the index and sourcetype. If you use a different internal wiki (like Sharepoint), then you will need to alter the search to pull the search logs for that system. The second piece is harder -- in order to find project code names being searched in your logs, you have to know what those code names are (and for this detection, what department they belong to). You will have to reach out to different departments in your organization to find this knowledge, but once you have it you can mirror the format of the sample sse_project_codenames lookup.

   Known False Positives

Because we're using a static threshold here for the number of different departments, you would need to adjust this threshold to suit your organization. Because these types of events are inherently fairly bursty (someone catches up on their email, someone switches into a project management role, etc.) it's difficult to use ML to solve for it but relatively easy to understand it given business context. This alert, in isolation, is often benign for exactly the reasons listed above.

   How To Respond

Because these activities can be benign (see Known False Positives), look for other indications of suspicious behavior with this user, or validate with their management or HR that the behavior is expected.

   Help

User Finding Project Code Names from Many Departments Help

This example leverages the Detect Spikes (standard deviation) search assistant. Our dataset is an anonymized collection of Confluence (an internal wiki software) logs centered around a few users for two months.

SPL for User Finding Project Code Names from Many Departments

Demo Data

First we bring in our basic demo dataset. In this case, anonymized Confluence logs. We're using a macro called Load_Sample_Log_Data to wrap around | inputlookup, just so it is cleaner for the demo data.
Next we filter for just search history in Confluence.
While you wouldn't have to do this with live data, for our sample data we're going to extract out the queryString explicitly.
Next we use eval's urldecode function to convert plus signs to spaces, and any other url encoding that might exist.
Now that we have everything looking clearly, we're going to use a regex to extract project code names from the search string. Normally with rex, you would include the regular expression here as a quoted string (much less scary). We're going to make this more complicated by using a subsearch, but it has the benefit of requiring that you don't have to enter the data twice. We'll explain the subsearch in the next line.
The goal of this line is to return a single string with the list of all the project code names in a field extraction, like "(?Project Eagle|Project Leprechaun)". This is fairly advanced SPL, so don't worry if it doesn't make sense to you, but the meat is that we have a lookup called sse_project_codenames that has a column for codeword, and a column for department. When we use a subsearch that has a field named "search" it will be literally interpreted by most parts of SPL, so it will insert the regex happily. If this doesn't make sense to you, read up on subsearches on docs.splunk.com, and then you can always just copy-paste the demo SPL and try it out!
Now we use the lookup command so that we can understand what department every project codename belongs to, pulled from the CSV file.
For simplicity, we want to group events together based on the day (you might look at this based on the hour, you might use the transaction command to give you a rolling window -- there are lots of approaches).
Now we use stats to look at the distinct count of department (the number of unique departments) whose codewords were searched by user, by day.
Finally we filter for where people have looked at five or more different departments.

Live Data

First we bring in our dataset, filtered for just the search history in Confluence.
Next we use eval's urldecode function to convert plus signs to spaces, and any other url encoding that might exist.
Now that we have everything looking clearly, we're going to use a regex to extract project code names from the search string. Normally with rex, you would include the regular expression here as a quoted string (much less scary). We're going to make this more complicated by using a subsearch, but it has the benefit of requiring that you don't have to enter the data twice. We'll explain the subsearch in the next line.
The goal of this line is to return a single string with the list of all the project code names in a field extraction, like "(?Project Eagle|Project Leprochaun)". This is fairly advanced SPL, so don't worry if it doesn't make sense to you, but the meat is that we have a lookup called sse_project_codenames that has a column for codeword, and a column for department. When we use a subsearch that has a field named "search" it will be literally interpreted by most parts of SPL, so it will insert the regex happily. If this doesn't make sense to you, read up on subsearches on docs.splunk.com, and then you can always just copy-paste the demo SPL and try it out!
Now we use the lookup command so that we can understand what department every project codename belongs to, pulled from the CSV file.
For simplicity, we want to group events together based on the day (you might look at this based on the hour, you might use the transaction command to give you a rolling window -- there are lots of approaches).
Now we use stats to look at the distinct count of department (the number of unique departments) whose codewords were searched by user, by day.
Finally we filter for where people have looked at five or more different departments.

Screenshot of Demo Data