Log File Automation

When I set up Splunk reporting for my website, it was a purely manual process, and I left for the future the goal of pulling the logs automatically. Since then, that’s exactly what I’ve done, so now it runs completely automatically. Below is how.

The first thing I needed to do was enable create SSH keys so that I could log into my website without any user intervention. Following this how-to from HostGator, I used PuTTYGen to create the key pair, then I placed the public key on my website. I could then SSH over port 2222 (instead of port 22) to my website, after a change to my firewall to allow communication over port 2222. To automate the process on my server, I created a config file in the ~/.ssh directory so that SSH would automatically use the right private key file when connecting. Step one was done.

Step two was to actually grab the files. For this, I finally settled on using rsync, a very handy *nix file-syncing utility. I wrote a bash script to rsync over SSH to pull down both the error.php log, as well as the Apache logs for my website (for the latter, I configured cPanel to create a ZIP file every day containing the day’s access logs). Some sed work, file renaming, and moving then happens, and the end result is that those files are dumped in a directory that Splunk is monitoring. Splunk then sees the new files and indexes them appropriately. Voilà!

I set up a cron job to run this as the splunk user once a day. It’s not real time, but it’s as close as you can get with the limitations around that Apache log. If, for some reason, the daily job doesn’t run correctly, there’s actually no problem: the next run will get the data with no data loss, since both the access logs and the error.php file will simply keep logging I grab them.

Next steps are to tweak this and see if I can grab failed logins for my blog and photo site as well; that way, I could correlate across all three login portals to see if people are trying to access just one part of the site or multiple. Unfortunately, neither WordPress nor ZenPhoto log failed authentication attempts by default, but there are solutions out there.

Now that I have some data, I’ll have to play around with some more dashboards for my website data in Splunk and update that. More to come!