Hosts Where Security Sources Go Quiet
A frequent concern of SOCs is that their data feeds will disappear. This search will look on a host-by-host basis for when your security sources stop reporting home.
How to Implement
This search should work universally on all Splunk environments, since it uses Splunk internal fields. The only implementation detail to be aware of is specifying the index and sourcetype for the security events you care about.
Known False Positives
This alert could generate false positives depending on how noisy the security log sources in your environment are. For example, if you are looking for AV to be disabled, but you typically only get an AV message every few days, this would be difficult to reliably search for. Checking for the silence of sparse data sources is tricky to implement, and should only be attempted for advanced SPL users.
How To Respond
When this search returns values, capture the time of the event and the sourcetype that is missing. Contact the system owner. If it is known behavior, document this as well as when it is expected to be resolved. If not, further investigation is warranted to determine if the collection capability was modified, stopped, deleted or compromised.
Hosts Where Security Sources Go Quiet Help
This example leverages the Simple Search assistant. Here we are looking through all Splunk logs for hosts that are sending logs, but not sending Windows Security logs. We track that percentage over time, and look to see how many instances where the percentage = 0 (no Windows Logs) in the past we've seen, and whether yesterday the percentage was zero.
SPL for Hosts Where Security Sources Go Quiet
Live Data (Auto Accelerated)
|First we start by pulling a count of events overall by host. (Note that when using back-to-back tstats, your field names need to be different, but you can't us the familiar 'count as count1' syntax, so here we use 'count(host)' to distinguish from the 'count' in the next line.|
|Now we pull a count of events for our in scope security source(s) by host, here Windows Security logs.|
|Next we use stats to combine our two prestats=t tstats commands into one usable stats. Whenever you use tstats prestats=t, you need a stats, chart, or similiar to pull the hidden prestats fields into the light so that you can use them.|
|From lines 1-3 we have a count of events per source (all vs security) per host per day, now we can calculate a percentage of logs that were WinSecurity on each of those days (for each of those hosts).|
|Technically this line isn't really required because we should be able to use now() in the next line, but I typically use it for uniformity with the demo datasets and in case you have some weird timezone hijinks in your environment.|
|Now we start the tricky piece. How do we figure out the business logic for when we want to be alerted to a security log going silent? At this stage, we're collecting a number of data points that we could use. For example, past_instances_of_no_logs will tell us how many days in the past we had no logs from this host -- if this is non-zero, then no logs today is much more likely to be benign. Similarly, we can use avg and stdev to calculate how wide a distribution there typically is -- if sometimes you have tons of security events, sometimes you have none, then this is could be just chance.|
|Finally we apply our filtering. In testing, the most reliable metric seemed to be looking for hosts where we do have a baseline (at least ten days, so we know something about this host), we have some historical data (isnotnull(avg)), and we've never seen a day with zero events before.|