First Time Accessing an Internal Git Repository

Description

Find users who accessed a git repository for the first time.


Use Case

Insider Threat, Advanced Threat Detection

Category

Data Exfiltration

Security Impact

This is an insider threat use case. Your developers are often granted access to the Git (or other software life cycle repository) that their responsibilities require, but one condition to be aware of is the first time a user accesses a given repository. This could be perfectly normal, or if the repository contains code not relevant to the developers role, could be an anomaly to investigate.

Alert Volume

High (?)

SPL Difficulty

Medium

Journey

Stage 3

MITRE ATT&CK Tactics

Collection

MITRE ATT&CK Techniques

Data from Information Repositories

MITRE Threat Groups

APT28
Ke3chang

Kill Chain Phases

Actions on Objectives

Data Sources

Web Server

   How to Implement

Implementation of this example (or any of the First Time Seen examples) is generally very simple.

  • Validate that you have the right data onboarded, and that the fields you want to monitor are properly extracted.
  • Save the search.

For most environments, these searches can be run once a day, often overnight, without worrying too much about a slow search. If you wish to run this search more frequently, or if this search is too slow for your environment, we recommend leveraging a lookup cache. For more on this, see the lookup cache dropdown below and select the sample item. A window will pop up telling you more about this feature.

Note: We include an accelerated version to show how this would work, but there is no data model for this out of the box, so you would need to build one yourself.

   Known False Positives

This is a strictly behavioral search, so we define "false positive" slightly differently. Every time this fires, it will accurately reflect the first occurrence in the time period you're searching over (or for the lookup cache feature, the first occurrence over whatever time period you built the lookup). But while there are really no "false positives" in a traditional sense, there is definitely lots of noise.

You should not review these alerts directly (except for access to extremely sensitive repositories), but instead use them for context, or to aggregate risk (as mentioned under How To Respond).

   How To Respond

When this search returns values, initiate your incident response process and identify the user account accessing the specific repo. Contact the user and manager to determine if they are accessing the repo with authorization. If they did not access this repo, attempt to determine if the user credentials have been used by another party by stealing a users credentials.

   Help

First Time Accessing an Internal Git Repository Help

This example leverages the Detect New Values search assistant. Our dataset is the Splunk-internal git source source checkout history for a couple of our Splunk UBA software developers, anonymized to Alice and Chuck. On the last day, I added in a few more developers who visit other repositories, but set their usernames to Chuck so that it looks like he started downloading from a bunch of repositories that he's never touched before. We also have a user Bob, who has checked out from a few other repositories in the past, and is on Chuck's team. For this analysis, we are looking at the first time a username has checked out from a repository names, and alerting if that was in the last day. We can always also filter for peer groups, to exclude those repositories that Bob (on Chuck's team) had viewed before.

SPL for First Time Accessing an Internal Git Repository

Demo Data

First we pull in our demo dataset.
Here we use the stats command to calculate what the earliest and the latest time is that we have seen this combination of fields.
Next we calculate the most recent value in our demo dataset
We end by seeing if the earliest time we've seen this value is within the last day of the end of our demo dataset.

Live Data

First, we start with our Atlassian BitBucket access logs (Atlassian is a commercial open-source version of git).
Next we extract the field names we will use. These are the regular expressions that have worked in a couple of environments, but you should verify them in yours.
Finally, we filter for just the logs that include a git_repo field.
Here we use the stats command to calculate what the earliest and the latest time is that we have seen this combination of fields.
We end by seeing if the earliest time we've seen this value is within the last day.

Accelerated Data

Here, tstats is pulling in one command a super-fast count per user, per repo, per day.
(self-explanatory)
Here we use the stats command to calculate what the earliest and the latest time is that we have seen this combination of fields.
We end by seeing if the earliest time we've seen this value is within the last day.