Splunk and cPanel: analyzing cPanel domain (Apache) logs

Splunk is a wonderful tool for mining apache access logs and obtaining other useful information. Out of the box, cPanel places all domain access logs (apache logs) in separate files (named by domain) in /etc/apache2/logs/domlogs. This blog post assumes you are already forwarding and indexing your cPanel domain/apache logs in Splunk with a very basic configuration like:

# Apache logs for each domain in cPanel
# Note: Slightly different path to avoid symlinks
[monitor:///var/log/apache2/domlogs/*.*]
sourcetype=apache:access
index=neil-test
disabled=0

Splunk can easily extract fields (like domain or vhost) from log entries themselves BUT in the case of cPanel, the vhost or domain is not actually contained in the log entries. It CAN easily be extracted from the name of the log file (source) though.

$ cd /etc/apache2/logs/domlogs
$ ls | grep neilautossl

neilautossl.nsabol.net
neilautossl.nsabol.net-bytes_log
neilautossl.nsabol.net-ssl_log

In the example above, the logs for the domain “neilautossl.nsabol.net” are shown in their native path with their native names.

Splunk search (with field extraction for vhost or domain) for HTTP (port 80) access logs

index="neil-test" source!="/var/log/apache2/domlogs/*-ssl_log"
source!="/var/log/apache2/domlogs/*-bytes_log" source!="/var/log/apache2/domlogs/ftpxferlog*" |
eval temp=split(source,"/") | eval domain=mvindex(temp,5) | fields - temp

Nothing fancy here. I excluded the ssl, ftp transfer and bytes logs with source!= and a wildcard, then split the source field by / and return the 5th index (which is the name of the log file, i.e. the domain). Since I used a field named temp for splitting, I removed that from the resulting fields at the end.

Splunk search (with field extraction for vhost or domain) for HTTPS/SSL (port 443) access logs

index="neil-test" source="/var/log/apache2/domlogs/*-ssl_log" |
eval temp=split(source,"/") | eval domain=mvindex(temp,5) | eval domain=replace(domain,"-ssl_log","") |
fields - temp

This is basically the same as above, except I am including only SSL logs (ending in -ssl_log) and am removing -ssl_log from the source (file) name with replace to obtain the real domain name. This one is a little cleaner than the first permutation since all https vhost logs end with the same suffix.

Splunk search (with field extraction for vhost or domain) for combined (HTTP and HTTPS) access logs

index="neil-test" source!="/var/log/apache2/domlogs/*-bytes_log"
source!="/var/log/apache2/domlogs/ftpxferlog*" | eval temp=split(source,"/") |
eval domain=mvindex(temp,5) | eval domain=replace(domain,"-ssl_log","") | fields - temp

This last example is a combination of the first 2 permutation to combine domain access for http and https (ssl) vhosts.

Resulting domain field

Resulting extracted domain field

Now, you can perform further analysis and mining using the domain field.

Conclusion

There is most definitely a cleaner way to handle this in configuration, but in a pinch, this got me what I needed without asking my Splunk admin for any changes. My hope is that it helps someone else with a similar requirement.

If you have a slicker way to handle this, please share in the comments.