O365 Logs

Data Source Onboarding Guide Overview

Overview  

Welcome to the Splunk Data Source Onboarding Guides (DSOGs)!

Splunk has lots of docs, so why are we creating more? The primary goal of the DSOGs is to provide you with a curated, easy-to-digest view of the most common ways that Splunk users ingest data from our most popular sources, including how to configure the systems that will send us data (such as turning on AWS logging or Windows Security's process-launch logs, for example). While these guides won't cover every single possible option for installation or configuration, they will give you the most common, easiest way forward.

How to use these docs: We've broken the docs out into different segments that get linked together. Many of them will be shared across multiple products. We suggest clicking the "Mark Complete" button above to remind yourself of those you've completed. Since this info will be stored locally in your browser, you won't have to worry about it affecting anyone else's view of the document. And when you're reading about ingesting Sysmon logs, for example, it's a convenient way to keep track of the fact that you already installed the forwarder in order to onboard your Windows Security logs.

So, go on and dive right in! And don't forget, Splunk is here to make sure you're successful. Feel free to ask questions of your Sales Engineer or Professional Services Engineer, if you run into trouble. You can also look for answers or post your questions on https://answers.splunk.com/.

General Infrastructure

Instruction Expectations and Scaling  

Expectations

This doc is intended to be an easy guide to onboarding data from Splunk, as opposed to comprehensive set of docs. We've specifically chosen only straightforward technologies to implement here (avoiding ones that have lots of complications), but if at any point you feel like you need more traditional documentation for the deployment or usage of Splunk, Splunk Docs has you covered with over 10,000 pages of docs (let alone other languages!).

Because simpler is almost always better when getting started, we are also not worrying about more complicated capabilities like Search Head Clustering, Indexer Clustering, or anything else of a similar vein. If you do have those requirements, Splunk Docs is a great place to get started, and you can also always avail yourself of Splunk Professional Services so that you don't have to worry about any of the setup.

Scaling

While Splunk scales to hundreds or thousands of indexers with ease, we usually have some pretty serious architecture conversation before ordering tons of hardware. That said, these docs aren't just for lab installs. We've found that they will work just fine with most customers in the 5 GB to 500 GB range, even some larger! Regardless of whether you have a single Splunk box doing everything, or a distributed install with a Search Head and a set of Indexers, you should be able to get the data and the value flowing quickly.

There's one important note: the first request we get for orchestration as customers scale, is to distribute configurations across many different universal forwarders. Imagine that you've just vetted out the Windows Process Launch Logs guide on a few test systems, and it's working great. Now you want to deploy it to 500, or 50,000 other Windows boxes. Well, there are a variety of ways to do this:

  • The standard Splunk answer is to use the Deployment Server. The deployment server is designed for exactly this task, and is free with Splunk. We aren't going to document it here, mostly because it's extremely well documented by our EDU and also docs.splunk.com, here.
  • If you are a decent sized organization, you've probably already got a way to deploy configurations and code, like Puppet, Chef, SCCM, Ansible, etc. All of those tools are used to deploy splunk on a regular basis. Now, you might not want to go down this route if it requires onerous change control, or reliance on other teams, etc. -- many large Splunk environments with well developed software deployment systems prefer to use the Deployment Server because it can be owned by Splunk and is optimized for Splunk's needs. But many customers are very happy with using Puppet to distribute Splunk configurations.
Ultimately, Splunk configurations are almost all just text files, so you can distribute the configurations with our packaged software, with your own favorite tools, or even by just copying configuration files around.

Indexes and Sourcetypes Overview  

Overview

The DSOGs talk a lot about indexes and sourcetypes. Here's a quick overview.

Splexicon (Splunk's Lexicon, a glossary of Splunk-specific terms) defines an index as the repository for data in Splunk Enterprise. When Splunk Enterprise indexes raw event data, it transforms the data into searchable events. Indexes are the collections of flat files on the Splunk Enterprise instance. That instance is known as an Indexer because it stores data. Splunk instances that users log into and run searches from are known as Search Heads. When you have a single instance, it takes on both the search head and indexer roles.

"Sourcetype" is defined as a default field that identifies the data structure of an event. A sourcetype determines how Splunk Enterprise formats the data during the indexing process. Example sourcetypes include access_combined and cisco_syslog.

In other words, an index is where we store data, and the sourcetype is a label given to similar types of data. All Windows Security Logs will have a sourcetype of WinEventLog:Security, which means you can always search for source=*wineventlog:security (when searching, the word sourcetype is case sensitive, the value is not).

Why is this important? We're going to guide you to use indexes that our professional services organization recommends to customers as an effective starting point. Using standardized sourcetypes (those shared by other customers) makes it much easier to use Splunk and avoid headaches down the road. Splunk will allow you to use any sourcetype you can imagine, which is great for custom log sources, but for common log sources, life is easier sticking with standard sourcetypes. These docs will walk you through standard sourcetypes.

Implementation

Below is a sample indexes.conf that will prepare you for all of the data sources we use in these docs. You will note that we separate OS logs from Network logs and Security logs from Application logs. The idea here is to separate them for performance reasons, but also for isolation purposes-you may want to expose the application or system logs to people who shouldn't view security logs. Putting them in separate indexes prevents that.

To install this configuration, you should download the app below and put it in the apps directory.

For Windows systems, this will typically be: c:\Program Files\Splunk\etc\apps. Once you've extracted the app there, you can restart Splunk via the Services Control Panel applet, or by running "c:\Program Files\Splunk\bin\splunk.exe" restart.

For Linux systems, this will typically be /opt/splunk/etc/apps/. Once you've extracted the app there, you can restart Splunk by running /opt/splunk/bin/splunk restart.

You can view the indexes.conf below, but it's easiest to just click Click here to download a Splunk app with this indexes.conf, below.

Splunk Cloud Customers: You won't copy the files onto your Splunk servers because you don't have access. You could go one-by-one through the UI and create all of the indexes below, but it might be easiest if you download the app, and open a ticket with CloudOps to have it installed.


Sample indexes.conf
# Overview. Below you will find the basic indexes.conf settings for
# setting up your indexes in Splunk. We separate into different indexes 
# to allow for performance (in some cases) or data isolation in others. 
# All indexes come preconfigured with a relatively short retention period 
# that should work for everyone, but if you have more disk space, we 
# encourage (and usually see) longer retention periods, particularly 
# for security customers.

# Endpoint Indexes used for Splunk Security Essentials. 
# If you have the sources, other standard indexes we recommend include:
# epproxy - Local Proxy Activity

[epav]
coldPath = $SPLUNK_DB/epav/colddb
homePath = $SPLUNK_DB/epav/db
thawedPath = $SPLUNK_DB/epav/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[epfw]
coldPath = $SPLUNK_DB/epnet/colddb
homePath = $SPLUNK_DB/epnet/db
thawedPath = $SPLUNK_DB/epnet/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[ephids]
coldPath = $SPLUNK_DB/epmon/colddb
homePath = $SPLUNK_DB/epmon/db
thawedPath = $SPLUNK_DB/epmon/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[epintel]
coldPath = $SPLUNK_DB/epweb/colddb
homePath = $SPLUNK_DB/epweb/db
thawedPath = $SPLUNK_DB/epweb/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[oswin]
coldPath = $SPLUNK_DB/oswin/colddb
homePath = $SPLUNK_DB/oswin/db
thawedPath = $SPLUNK_DB/oswin/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[oswinsec]
coldPath = $SPLUNK_DB/oswinsec/colddb
homePath = $SPLUNK_DB/oswinsec/db
thawedPath = $SPLUNK_DB/oswinsec/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[oswinscript]
coldPath = $SPLUNK_DB/oswinscript/colddb
homePath = $SPLUNK_DB/oswinscript/db
thawedPath = $SPLUNK_DB/oswinscript/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[oswinperf]
coldPath = $SPLUNK_DB/oswinperf/colddb
homePath = $SPLUNK_DB/oswinperf/db
thawedPath = $SPLUNK_DB/oswinperf/thaweddb
frozenTimePeriodInSecs = 604800 
#7 days

[osnix]
coldPath = $SPLUNK_DB/osnix/colddb
homePath = $SPLUNK_DB/osnix/db
thawedPath = $SPLUNK_DB/osnix/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[osnixsec]
coldPath = $SPLUNK_DB/osnixsec/colddb
homePath = $SPLUNK_DB/osnixsec/db
thawedPath = $SPLUNK_DB/osnixsec/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[osnixscript]
coldPath = $SPLUNK_DB/osnixscript/colddb
homePath = $SPLUNK_DB/osnixscript/db
thawedPath = $SPLUNK_DB/osnixscript/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[osnixperf]
coldPath = $SPLUNK_DB/osnixperf/colddb
homePath = $SPLUNK_DB/osnixperf/db
thawedPath = $SPLUNK_DB/osnixperf/thaweddb
frozenTimePeriodInSecs = 604800 
#7 days

# Network Indexes used for Splunk Security Essentials
# If you have the sources, other standard indexes we recommend include:
# netauth - for network authentication sources
# netflow - for netflow data
# netids - for dedicated IPS environments
# netipam - for IPAM systems
# netnlb - for non-web server load balancer data (e.g., DNS, SMTP, SIP, etc.)
# netops - for general network system data (such as Cisco iOS non-netflow logs)
# netvuln - for Network Vulnerability Data

[netdns]
coldPath = $SPLUNK_DB/netdns/colddb
homePath = $SPLUNK_DB/netdns/db
thawedPath = $SPLUNK_DB/netdns/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[mail]
coldPath = $SPLUNK_DB/mail/colddb
homePath = $SPLUNK_DB/mail/db
thawedPath = $SPLUNK_DB/mail/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[netfw]
coldPath = $SPLUNK_DB/netfw/colddb
homePath = $SPLUNK_DB/netfw/db
thawedPath = $SPLUNK_DB/netfw/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[netops]
coldPath = $SPLUNK_DB/netops/colddb
homePath = $SPLUNK_DB/netops/db
thawedPath = $SPLUNK_DB/netops/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[netproxy]
coldPath = $SPLUNK_DB/netproxy/colddb
homePath = $SPLUNK_DB/netproxy/db
thawedPath = $SPLUNK_DB/netproxy/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days

[netvpn]
coldPath = $SPLUNK_DB/netvpn/colddb
homePath = $SPLUNK_DB/netvpn/db
thawedPath = $SPLUNK_DB/netvpn/thaweddb
frozenTimePeriodInSecs = 2592000
#30 days


# Splunk Security Essentials doesn't have examples of Application Security, 
# but if you want to ingest those logs, here are the recommended indexes:
# appwebint - Internal WebApp Access Logs
# appwebext - External WebApp Access Logs
# appwebintrp - Internal-facing Web App Load Balancers
# appwebextrp - External-facing Web App Load Balancers
# appwebcdn - CDN logs for your website
# appdbserver - Database Servers
# appmsgserver - Messaging Servers
# appint - App Servers for internal-facing apps 
# appext - App Servers for external-facing apps 

Validation

Once this is complete, you will be able to find the list of indexes that the system is aware of by logging into Splunk, and going into Settings -> Indexes.

Forwarder on Linux Systems  

Overview

Installing the Windows forwarder is a straightforward process, similar to installing any Linux program. These instructions will walk you through a manual instruction for getting started (perfect for a lab, a few laptops, or when you're just getting started on domain controllers). You will have three options for how to proceed -- using an RPM package (easiest for any Red Hat or similar system with rpm), using a DEB package (easiest for any Ubuntu or similiar system with dpkg), or using just the compressed .tgz file (will work across Linux platforms).

Note: For full and latest information on installing a forwarder, please follow the instructions in the Linux installation manual:
http://docs.splunk.com/Documentation/Forwarder/latest/Forwarder/Installanixuniversalforwarder

Implementation

Prerequisites
  1. You will need to have elevated permissions to install the software and configure correctly
Installation using an RPM file:

Make sure you have downloaded the universal forwarder package from Splunk’s website: https://www.splunk.com/en_us/download/universal-forwarder.html and have it on the system you want to install Splunk on.

Run: rpm -i splunkforwarder<version>.rpm

This will install the Splunk forwarder into the default directory of /opt/splunkforwarder

To enable Splunk to run each time your server is restarted use the following command:
    /opt/splunkforwarder/bin/splunk enable boot-start

Installation using an DEB file:

Make sure you have downloaded the universal forwarder package from Splunk’s website: https://www.splunk.com/en_us/download/universal-forwarder.html and have it on the system on which you want to install Splunk.   

Run: dpkg -i splunkforwarder<version>.rpm

This will install the Splunk forwarder into the default directory of /opt/splunkforwarder

To enable Splunk to run each time your server is restarted use the following command:
    /opt/splunkforwarder/bin/splunk enable boot-start

Installation using the .tgz file:

Make sure you have copied the tarball (or appropriate package for your system) and extract or install it into the /opt directory.

Run: tar zxvf <splunk_tarball_file.tgz> -C /opt

[root@ip-172-31-94-210 ~]# tar zxvf splunkforwarder-7.0.1-2b5b15c4ee89-Linux-x86_64.tgz -C /opt
splunkforwarder/
splunkforwarder/etc/
splunkforwarder/etc/deployment-apps/
splunkforwarder/etc/deployment-apps/README
splunkforwarder/etc/apps/

Check your extraction:

Run: ls -l /opt

[root@ip-172-31-94-210 apps]# ls -l /opt
total 8
drwxr-xr-x 8 splunk splunk 4096 Nov 29 20:21 splunkforwarder

If you would like Splunk to run at startup then execute the following command
    /opt/splunkforwarder/bin/splunk enable boot-start

Wrap Up

After following any of the above three options, you will have a fully installed Splunk forwarder. There are three more steps you’ll want to take before you can see the data in Splunk:

  • You will need an outputs.conf to tell the forwarder where to send data (next section)
  • You will need an inputs.conf to tell the forwarder what data to send (below, in the "Splunk Configuration for Data Source")
  • You will need an indexes.conf on the indexers to tell them where to put the data received. (You just passed that section.)

Sending Data from Forwarders to Indexers  

Overview

For any Splunk system in the environment, whether it's a Universal Forwarder on a Windows host, a Linux Heavy-Weight Forwarder pulling the more difficult AWS logs, or even a dedicated Search Head that dispatches searches to your indexers, every system in the environment that is not an indexers (i.e., any system that doesn't store its data locally) should have an outputs.conf that points to your indexers.

Implementation

Fortunately the outputs.conf will be the same across the entire environment, and is fairly simple. There are three steps:

  1. Create the app using the button below (SplunkCloud customers: use the app you received from SplunkCloud).
  2. Extract the file (it will download a zip file).
  3. Place in the etc/apps directory.

For Windows systems, this will typically be: c:\Program Files\Splunk\etc\apps. Once you've extracted the app there, you can restart Splunk via the Services Control Panel applet, or by running "c:\Program Files\Splunk\bin\splunk.exe" restart.

For Linux systems, this will typically be /opt/splunkforwarder/etc/apps/. Once you've extracted the app there, you can restart Splunk by running /opt/splunk/bin/splunk restart.

For customers not using SplunkCloud:

Sample outputs.conf
[tcpout]
defaultGroup = default-autolb-group

[tcpout:default-autolb-group]
server = MySplunkServer.mycompany.local:9997

[tcpout-server://MySplunkServer.mycompany.local:9997]

Here is the completed folder.

Validation

Run a search in the Splunk environment for the host you've installed the forwarder on. E.g., index=* host=mywinsystem1*

You can also review all hosts that are sending data from via | metadata index=* type=hosts

System Configuration

Office 365 Overview  

The Office365 Reporting Add-on lets you collect Exchange message-tracking logs by querying the Office 365 Reporting web service API and indexing the results.

Exchange message-tracking logs record email message activity as they flow through the transport pipeline on Exchange mail servers. These are particularly helpful not only for exchange troubleshooting and diagnosing, but also from a security-operations perspective.

They can help you:

  • Find out what happened to a message sent by a specific sender.
  • Find out if a transport rule acted on a message.
  • Find out if a message sent from an Internet sender made it into your Exchange organization.
  • Correlate sender domains against threat intelligence or look for non-standard senders.

Validate Office 365 Permissions  

The Office365 Reporting Add-on requires an Exchange admin account to query the message trace APIs to retrieve data.

To validate that the account you are using has sufficient access:

  1. Login to https://portal.office.com
  2. Access the Exchange Admin Center
  3. Select mail flow, then message trace. If you're able to successfully run a message trace, the account will suffice.

    Confirmed -- we're able to configure Message Traces, so our account works.

System Configuration References  

Here are links from this section:

Splunk Configuration for Data Source

Sizing Estimate  

There is a large amount of variability in the volume of O365 logs. There are several areas that impact volumes:

  • Subscription type and Workloads (Apps) used
  • Size of organisation
  • O365 adoption inside of the organization
  • Kinds of federation / ADsync / ExpressRoute, and etc.

Message trace events tend to be about 650 bytes each, with multiple events per email. Management logs tend to be about 1200 bytes each, and Azure Audit logs tend to be north of 3000 bytes each.

If you're trying to use this information to accurately size your environment, unfortunately there's not an easy way to size this information without just ingesting it. The way that most customers will determine their data ingest is to just start ingesting data, and then look at how much data has been ingested.

Where to Collect  

Pulling logs from Office 365 requires a web service. It is functionally very different from grabbing data from local logs or events, since it must be configured via the Office 365 Reporting Add-on on a Splunk box with a web UI. It's deployed in one of the following two ways:

  1. Single instance: Splunk customers who have a smaller Splunk load that fits on a single system often add the Technology Add-on (TA) to the same system. Sizing here is environment specific, so you will want to ensure adequate performance (although this setup is usually quite workable in smaller environments). If you need to, you can always redo the configuration later, using a dedicated heavy forwarder.
  2. Heavy forwarder: In most environments, customers will install the TA on a dedicated heavy forwarder. A heavy forwarder is just like a normal Splunk install (in effect, not a universal forwarder), but its only role is to pull in data from special sources and send it to indexers.

The Office 365 Reporting Add-on requires Internet connectivity to run REST API queries to the reporting web service.

While, generally speaking, it is Splunk best practice to install TAs across all parts of your Splunk environment (particularly props and transforms), in the case of the Office 365 Reporting Add-on, we will be reaching out to a cloud service, which makes the configuration slightly different. We separate out installing the TA from configuring the inputs.

Configuring the inputs: You will only configure the inputs on one system in your environment, such as a heavy forwarder or a single instance. (See "Overview" for more detail.)

Installing the TA: The TA itself should reside wherever you configure the inputs (since the TA is the mechanism that allows you to configure the inputs). If you have a larger or more advanced environment where you configure the inputs on a heavy forwarder, you should also install the TA on your search heads, so you can see the Office 365 field extractions.

Advanced tip: Hide the app on your search heads, so you don’t accidentally reconfigure and duplicate your data later. To do this, click the app dropdown on the upper left-hand corner of the screen, then select Manage Apps. then Edit Properties next to the Office 365 Reporting Add-on. Next, click Visible: No and then save.

The following table provides a reference for installing this specific add-on to a distributed deployment of Splunk Enterprise:

Splunk Platform Component

Supported?

Required

Search heads

Yes

Yes

Heavy forwarders

Yes

Depends on size

Indexers

Yes

No

Universal forwarders

No

No

Install the Technology Add-On -- TA  

Log into Splunk and click Splunk Apps.

Click Splunk Apps to find the AWS Add-on.

Search for "Office 365 Reporting." Click the Install button.

Search for Office 365 Reporting in Apps

After installation, click Restart Now.

When asked, restart Splunk.

Log back into Splunk and select the Microsoft Office 365 Reporting Add-on app.

Splunk Cloud Customers: you won't be copying any files or folders to your indexers or search heads, but good news! Even though the Office 365 Reporting Add-on is not Cloud Self-Service Enabled, you will still be able to open a ticket with Cloud Ops and be ready to go in short order.

O365 Indexes and Sourcetypes  

Overview

Amongst Splunk’s 15,000+ customers, we’ve done a lot of implementations, and we’ve learned a few things along the way. While you can use any sourcetypes or indexes that you want in the "land of Splunk," we’ve found that the most successful customers follow specific patterns, as it sets them up for success moving forward.

Implementation

Here is a table of sourcetypes and indexes we recommend. If you have already followed the recommended indexes.conf setup above, then the index is already configured for you and everything will run automatically. If you are blazing your own path, we strongly recommend creating an index called “mail” now. (on premise link, cloud link)

Data source

Description

Sourcetype

Index

ms_o365_message_trace

REST API data from the O365 reporting web service

ms:o365:reporting:messagetrace

mail

Configuration  

Implementation

  1. In the Microsoft Office 365 Reporting Add-on for Splunk, select Configuration in the navigation, and then Add.

    Here we are adding a new O365 configuration.
  2. Enter Name, Username, and Password. Select Add.

    Now we add in the account information we validated above.
  3. Now that we've configured our account, select the Inputs tab, then Create New Input.

    Here we can do an input.
  4. Enter Name and Interval. Select Index and Office365 Account. Enter Start date/time and select Add.

    Here we are adding a new O365 configuration.
    Note: Depending on the size of the environment, you may run into issues with Azure limits when trying to retrieve too many previous events. If historical data is not essential, set the start date/time as the current day.

Validation

Validate the input and confirm the data is being ingested by running the following search: index=mail sourcetype=ms:o365:reporting:messagetrace

O365 Message Traces Successfully Ingested!

Splunk Configuration for Data Source References