Emails with Lookalike Domains

Description

Emailing from a domain name that is similar to your own is a common phishing technique, such as splunk.com receiving an email from spiunk.com. This search will detect those similar domains.


Use Case

Advanced Threat Detection

Category

Endpoint Compromise, SaaS

Alert Volume

Very Low (?)

SPL Difficulty

Advanced

Journey

Stage 4

MITRE ATT&CK Tactics

Initial Access

MITRE ATT&CK Techniques

Spearphishing Link

MITRE Threat Groups

APT28
APT29
APT32
APT33
APT39
Cobalt Group
Dragonfly 2.0
Elderwood
FIN4
FIN8
Kimsuky
Leviathan
Machete
Magic Hound
Night Dragon
OilRig
Patchwork
Stolen Pencil
TA505
Turla

Kill Chain Phases

Delivery

Data Sources

Email

   How to Implement

Implementing this search is generally fairly straightforward. If you have CIM compliant data onboarded, it should work out of the box, however you are always better off specifying the index and sourcetype of your email data (think particularly when you have multiple email log sources, such as a perimeter ESA and a core Exchange environment). If you have the right index, sourcetype, you have the src_user field, and you've installed the URL Toolbox app, it should work like a charm.

   Known False Positives

This search will through incoming emails for any domains similar to your domain names, much like running dnstwist on a domain name. If there are any incoming emails with source domain names that are very similar to but not the same, they would create alerts which could be false positives. One might imagine a scenario where a company who manufactures wooden planks for pirate ships, plank.com, emails their sales rep at splunk.com. That would create a difference of 2 (u->a, extra s) and would be flagged (Arrrr!). Known examples of this could be filtered out in the search, or you could pipe this into a First Time Seen detection to automatically remove past examples.

   How To Respond

When this search returns values, initiate your incident response process and capture the time of the event, the sender, recipient, subject or the mail and attachments, if any. Contact the sender. If it is authorized behavior, document that this is authorized and by whom. If not, the user credentials may have been used by another party and additional investigation is warranted.

   Help

Emails with Lookalike Domains Help

This example leverages the Simple Search assistant. Our dataset is an anonymized collection of email logs centered around a particular user for a month.

SPL for Emails with Lookalike Domains

Demo Data

First we start by pulling our demo email logs, where we have a source address (this could also work for proxy logs!)
This is an intensive exercise, so let's start by aggregating per source address, so we don't end up running over the same email many times
Next we are going to extract the domain -- probably this should actually occur before the last stats, but the performance is similar and this way it matches the accelerated search where this step is required.
Now we aggregate per actual domain we will analyze, for performance reasons
Let's filter out any domains that our organization owns and expects to receive email from. You can have several domains here (I recommend no more than 10-20 -- eventually urltoolbox will get tired and stop doing adding Levenshtein fields, so you can look for null ut_levenshtein later if you are pushing this boundary).
Now we use the free URL Toolbox app to parse out subdomains from the top level domains. We want to analyze each one, so that an attacker can't send mycompany.yourithelpdesk.com and get through, or mail.mycampany.com.
The field we are going to pass to the Levenshtein algorithm is domain_detected, so let's add each subdomain to the multi-value field domain_detected.
This step is not required, but I like to filter down the list of fields mid-search just to make it easier for me to read and track it. URL Toolbox adds a *lot* of fields, but these four are the only fields I care about from now on.
Last piece of prep -- let's simplify everything exactly the two fields that URL Toolbox's Levenshtein algorithm is expecting.
Now the real magic: URL Toolbox is given two multi-value fields, and it does the cross checking to calculate the Levenshtein score for each combination. We pull out the lowest score from this group.
Now we filter for a Levenshtein score less than three (so two or fewer changes required to go from the domain to one of our standard domains). Those who have used Levenshtein are likely thinking: "Wait, what about the > 0 that we always use?" -- we accomplished that by filtering out standard domains way back at the start.
Finally we do some | fields and | rename so that everything looks nice and friendly for analysts to understand what we're looking at.

Live Data

First we start by pulling our email logs, where we have a source address (this could also work for proxy logs!)
This is an intensive exercise, so let's start by aggregating per source address, so we don't end up running over the same email many times
Next we are going to extract the domain -- probably this should actually occur before the last stats, but the performance is similar and this way it matches the accelerated search where this step is required.
Now we aggregate per actual domain we will analyze, for performance reasons
Let's filter out any domains that our organization owns and expects to receive email from. You can have several domains here (I recommend no more than 10-20 -- eventually urltoolbox will get tired and stop doing adding Levenshtein fields, so you can look for null ut_levenshtein later if you are pushing this boundary).
Now we use the free URL Toolbox app to parse out subdomains from the top level domains. We want to analyze each one, so that an attacker can't send mycompany.yourithelpdesk.com and get through, or mail.mycampany.com.
The field we are going to pass to the Levenshtein algorithm is domain_detected, so let's add each subdomain to the multi-value field domain_detected.
This step is not required, but I like to filter down the list of fields mid-search just to make it easier for me to read and track it. URL Toolbox adds a *lot* of fields, but these four are the only fields I care about from now on.
Last piece of prep -- let's simplify everything exactly the two fields that URL Toolbox's Levenshtein algorithm is expecting.
Now the real magic: URL Toolbox is given two multi-value fields, and it does the cross checking to calculate the Levenshtein score for each combination. We pull out the lowest score from this group.
Now we filter for a Levenshtein score less than three (so two or fewer changes required to go from the domain to one of our standard domains). Those who have used Levenshtein are likely thinking: "Wait, what about the > 0 that we always use?" -- we accomplished that by filtering out standard domains way back at the start.
Finally we do some | fields and | rename so that everything looks nice and friendly for analysts to understand what we're looking at.

Accelerated Data

First we ask tstats to give us a list of source addresses for emails (with a count), and we rename it so that it's easier to work with.
Next we are going to extract the domain.
Now we aggregate per actual domain we will analyze, for performance reasons
Let's filter out any domains that our organization owns and expects to receive email from. You can have several domains here (I recommend no more than 10-20 -- eventually urltoolbox will get tired and stop doing adding Levenshtein fields, so you can look for null ut_levenshtein later if you are pushing this boundary).
Now we use the free URL Toolbox app to parse out subdomains from the top level domains. We want to analyze each one, so that an attacker can't send mycompany.yourithelpdesk.com and get through, or mail.mycampany.com.
The field we are going to pass to the Levenshtein algorithm is domain_detected, so let's add each subdomain to the multi-value field domain_detected.
This step is not required, but I like to filter down the list of fields mid-search just to make it easier for me to read and track it. URL Toolbox adds a *lot* of fields, but these four are the only fields I care about from now on.
Last piece of prep -- let's simplify everything exactly the two fields that URL Toolbox's Levenshtein algorithm is expecting.
Now the real magic: URL Toolbox is given two multi-value fields, and it does the cross checking to calculate the Levenshtein score for each combination. We pull out the lowest score from this group.
Now we filter for a Levenshtein score less than three (so two or fewer changes required to go from the domain to one of our standard domains). Those who have used Levenshtein are likely thinking: "Wait, what about the > 0 that we always use?" -- we accomplished that by filtering out standard domains way back at the start.
Finally we do some | fields and | rename so that everything looks nice and friendly for analysts to understand what we're looking at.