We all know how important logs are, I use them regularly to find why the automated firewall blocked an IP address or to figure out an error in the apache config or the problem with the Asterisk phone tree. I also know that our network engineers use them to identify problems with switches or routers and even the guys using Windows use Event Viewer to audit system logins or investigate system crashes.
There are two key issues that arise when trying to use local logs (logs on the system, in Event Viewer or in /var/log, etc), and they’re such a critical part of the investigation process.
- When using logs to investigate a compromised machine, the logs themselves are almost entirely untrustworthy as there is very little stopping an attacker from modifying the logs and removing traces of themselves.
- If the logs are stored in volatile memory or the system is designed to erase/overwrite logs on reboot (common on network equipment) then the logs are forever lost in the event that the system restarts either as part of the problem, resolution, or afterwards but before the logs can be retrieved for use in investigation.
There are only a small number of solutions to these issues, and the most common is to use a Centralized Logging System. If you have more than 2-3 systems then immediately you can begin reaping the benefits of central logging. Event Viewer for Windows has, since Vista/2008 had the ability to forward events included in its core functionality. Prior to that, in XP and 2003 there was an Add-in available that granted this functionality. From there it should be a fairly simple configuration to set up a single server and point the clients at it with their events. On Linux there are a number of tools, one of the more common is rsyslog which can both accept and transmit log entries. There are also services available such as Loggly or Splunk which offer both free and paid tiers of service to store logs on your behalf on their remote side, and downloadable tools such as Graylog which will act as a receiver for syslog connections and allow searching and statistics to be run, or alert emails to be sent when specific log entries are encountered. There are countless other tools or you can even write your own.
The key is to ensure your logging server is as secure as possible, as well as reliable — if the log server fails it can’t receive logs, so in the event that your log server is down and an event occurs, you’ll need to be able to grab the logs from the device without relying on the central system. At the same time, security is a concern, if an intruder can reach the logging server then suddenly none of your logs are safe.
They have already been mentioned above, but some other useful tasks that can be made easier with a centralized system can include:
- Triggering alerts based on log entries. Do you have a Known Issue that happens occasionally but isn’t often enough or critical enough to get fixed, but still warrants immediate attention? Set up an alert so that if it occurs on any of your systems, you can be alerted to it as soon as possible.
- Statistics and other data. Do you suspect that your trend of bounced emails is going up? Maybe you want to be able to gather data across all of your web servers for page hits for the last month. With all of your logs being gathered and forwarded to a central location, you have greater power to run various analysis on them to mine data or gather stats on your environments. They should all still be properly tagged with the machine they came from, so extracting the relevant data remains easy enough to do while still keeping them together. I also use a system called ‘logcheck’ which pulls all of my logs for the previous hour, drops lines which match regex expressions (for the lines I know are there but really don’t care about) and then sends the results in an email – by centralizing my logs I can get a single email for all of the systems and I also only need to maintain one set of ignore files.
- Verifying data integrity on the hosts themselves. If you’ve had a break-in and you’re concerned the logs have been tampered with then you can not only use the central log to find what is missing, you can run a diff between the log on the central server and the log on the host itself to see exactly what is different between them.
From here, it’s on you to figure out how to set up remote logging for your environment. I recommend rsyslogd for Linux and similar systems, or you can read up on centralizing the Event Viewer for Windows. Either way, I can hardly recommend centralized logging enough.
Here’re a few tips:
Set up a rsyslog server. Configure all your (r)syslog(-ng) clients to send remote logs to that (both syslog, rsyslog and syslog-ng support that).
Throw logstash and kibana on the central logging server. (http://logstash.net/) and (http://www.elasticsearch.org/overview/kibana/). Set up a few filters in the web ui for error messages from ssh and such.
Configure your windows machines to send logs to syslog. There are multiple tools for that, I have good experience with this one: https://code.google.com/p/eventlog-to-syslog/ – it also supports remote logging.
Once you have all your logs in one place, set up more filters and aggregate data to dashboards. If you have a web shop application that can log, configure a dashboard to show how many sales you did. Or how many failed logins there were.
You can even go so far to couple kibana/rsyslog to tag logs if they match a filter and send the tagged logs to nagios or zabbix for alerting…