I remember hitting my first 100 logs per second still very distinctly. After a few weeks of tweaking Elasticsearch to get an understanding of it, and fighting with Logstash and grok to make it actually do what I wanted….all without crashing Elasticsearch due to the very anemic resources I had given it and having Logstash actually keep up with the ingestion rate without falling behind and dropping events. (and did I mention getting grok to do what I wanted without using 100% CPU??!?!). Created a Kibana visualization that tracked the metrics from Logstash about the processing rate and looked on with admiration, 100+ events per second, data was parsed into the correct fields, and it was just humming along. It was a very happy moment.
Then came 200 events per second, then 500 events per second, then 1000 events per second. Each one of those required overcoming a whole different set of hurdles, in hindsight at 100 events per second you can pretty much do things very wrongly and everything will still work but as the message ingestion rate increases those errors and inefficiencies will start to compound until you reach a threshold of a certain events/second that is difficult to overcome and usually requires re-working a major part of your log processing pipeline.
Then came the 1,000,000,000+ events per day watermark, which works out to about 11,500 events per second on a 24 hour average. Logging at this message rate introduces a whole different set of considerations and requirements just to keep things from imploding once there is the inevitable problem. Some of the main points:
- Decoupling of the Logstash input and Logstash indexing is a requirement if you have any UDP log sources.
- A buffer is required to store events in when the Logstash/Elasticsearch slows down or in order to handle messaging spikes
- Separating different log sources into different Logstash input ports becomes unmanageable and prone to error
- Having excess processing power available in a scalable manner is necessary to be able to handle hardware failures, message spikes, and catching back up after a failure
With logging sources like the popular Beats packages, if Logstash stops responding the Beats will just wait and once Logstash comes back online, start sending logs from where it left off. With UDP log sources the sending application will keep sending regardless of if Logstash is online or not and there is no way to scale back the ingestion rate of logs without resorting to dropping them. Separating out the input stage of Logstash with the indexing stage of Logstash with buffer between the two is the most critical architecture decision you can make. The input stage is only receiving logs from the sending devices, figuring out which log is which, then writing it to a buffer for later processing. The Logstash indexers then read from buffer, process the data, and write to Elasticsearch. The flexibility and power of decoupling the two is immense when dealing with large log processing rates.
- Huge message spike? No problem, the Logstash input servers are mostly idle and will soak up that spike of messages and write it into the buffer without issue.
- Typo a Logstash config and it is not working properly now? 95% of the config is done on the Logstash indexing servers so the worst that happens is the messages queue up in the buffer while you fix the issue. The Logstash input servers and configs will hardly ever be touched or changed.
- Elasticsearch rebuild or maintenance that is slowing down the indexing rate in Elasticsearch, which causes the Logstash indexers to slow down due to output latency? You can turn off the indexers to give Elasticsearch more resources to rebuild faster, then turn the Logstash indexers back on after Elasticsearch is done.
On the left you have your different types of log types and sources. All syslog sends to the default syslog ports on the load balancers, which is then forwarded to a Logstash input server. Same for all other log source types. The Logstash input servers identify what type of log it is, then write to a buffer (Redis is simple and works well for this purpose). The Logstash indexers then read from the Redis servers, process the logs, then write to a locally installed Elasticsearch client. The Elasticsearch client then distributes the write of the logs into the Elasticsearch cluster.
Logstash input servers
- Relatively speaking, these servers do not do much. They read from an input plugin, do some tagging and some simple regex to determine which log is which, then they write to Redis.
- Servers are setup for redundancy in a N-1 scenario, so a single server can handle 100% of the workload
- 20 GHz (physical, not hyperthreaded) and 8 GB of RAM is more then enough power per server (unless you really suck at writing regex…) to hit the 15,000 per second logging rate.
- Growth method is to scale out, add an additional Logstash server and add it to your load balancing config.
- Redis does not use much CPU when used as a buffer and by design is single threaded. A single Redis server running as a buffer can hit well over 100,000 events per second and be < 100% on a single CPU core.
- A simple way to consolidate hardware and save on costs is to have Redis running on the Logstash input servers. Logstash uses the CPU, Redis uses the RAM.
- Size the RAM based on log size/ingestion rate and how long you want to be able to buffer logs for. Logs from firewalls tend to be short and not use up much space, logs from the Windows Event Log or any Java log tend to be large and take up huge amounts of space.
- For a mixed set of logs at 1 billion logs per day, 100GB in Redis gives between 1-3 hours of buffering before filling up.
Logstash indexing servers
- These servers will run very hot on CPU usage. Also, more importantly, these servers need to be powerful enough to catch back up and clear out the buffer after an outage or delay. If you are inputting logs at 15,000 per second, your buffer is sized for 1 hour of retention, and you want your Logstash indexers to be able clear the buffer in 1 hour, the indexers will need to be able to run at 30,000 logs per second.
- 40 GHz (physical, not hyperthreaded) and 64 GB of RAM will allow each server to roughly process 10,000 to 20,000 events per second depending on the complexity of the configuration and log. An Elasticsearch client runs on the server and at least 16 GB of RAM is assigned to the ES JVM.
- Growth is very simple, just add another server.
Up above I mentioned that creating a dedicated Logstash input for every log type is not a sustainable practice. This is a particularly ironic problem because a large number of example configs and blogs will suggest creating a different Logstash input per application as a method to differentiate between different log types…but this method will start to fall apart after you have 10 or 20 or 100 different log sources and log formats, with 100 different Logstash inputs on each server, each with a different port number. Eventually somebody will send something to the wrong port because keeping track of 100 different ports is complicated, and it will break all of the Logstash parsing and really trash up your Elasticsearch index. You will also start to run into performance issues when using so many different input plugins.
Instead, do yourself a favor and from the start send all syslog logs to the same syslog port(s) and use Logstash to figure out which one is which. Most logs are unique enough either by the server/device name or the format itself, IF statements with regex strings and if there is a match then label that log as a certain log type. Lets take a look at a syslog since it will be the most common thing you run into this problem with.
You take the various different syslog input ports and tag the log with the associated info and then tag the log with the name of the Logstash server, this way you can determine which input plugin and which Logstash input server processed a particular log. Then you do a large IF | ELSE IF | ELSE tree searching for certain unique strings in the logs. Almost all logs either have a unique format, or a unique value in the log itself. Put the most frequent log types at the beginning, and the less frequent at the end to save on CPU usage since the truth statement will escape once matched. Also keep in mind that the truth statement is processed in order, sometimes logformat A and logformat B will both have the same value X, while logformat A has a unique value Y. First, filter on value Y and if true then logformat = A, then filter on value X and if true then logformat = B. Then most importantly at the end there is the ELSE statement that acts as a catchall if nothing else matches. Both important for troubleshooting your Logstash config so you actually know you have a problem and also so that no logs are ever dropped.
Very simple and very short config that is light on the CPU, just some tagging on the input plugins, a mutate to add the servername tag, some if statements to search for regex strings in the log message, a mutate to add the logformat field with the specific value, then a (very fast) output to Redis. This allows for the Logstash service to mostly be idle which in turn allows it to spend more time servicing/waiting on the input plugins which means few if none UDP based logs will be dropped.
Then how to handle the syslog messages on the Logstash indexing servers.
Since the log type was figured out on the Logstash input servers it is just a simple matter of filtering based on that field name on the Logstash indexer servers and then applying the log specific parsing.
Why parse out the log type on the Logstash input servers instead of the Logstash indexers? Its a good question and the answer revolves mostly around the higher rate of log ingestion. When you are at lower logging rates, say only 500 events a second input, and the indexers are taken offline for 1 hour of maintenance, when you turn the indexers back on it is only 1.8 million logs in the buffer to process. You can do them in FIFO manner and there really is no reason to do any log type parsing on the Logstash input servers at all. When you are running 15,000 events per second though and the same indexers are taken offline for 1 hour of maintenance, when you turn the indexers back on it is 54 million logs in the buffer to process on top of the continuing input stream of 15,000 events per second that are still coming in. When the log types are filtered and determined on the Logstash input servers it allows you as the ELK administrator to prioritize certain logs over others and write them to separate Redis lists. The Logstash indexers will flush out the higher priority Redis buffers first, then more slowly flush out the lower priority ones.
In this example you determine which logs are high importance and low importance and write them to the respective lists. The high importance logs have a greater number of Logstash indexer servers assigned to it and it will flush the logs much more quickly. The lower importance logs have fewer Logstash indexer servers and it will process them at a lower set rate and take much longer to flush the buffer.
In this example it is meant to deal with that Beats based log collectors can and will backfill logs up to a configurable date. If you are bringing servers back online or deploying new agents and you want to send 30 days of back logs from the servers to ELK it can cause everything in the ELK pipeline to congest behind all of the old logs being sent. Instead on the Logstash input servers any log older then a certain number of days is output to a different Redis buffer and processed by a seperate Logstash indexer service. This allows for current logs to continue on at normal processing speed but allow old logs to still process but at a much slower speed.
What happens if you want to collect Windows firewall logs? They are in a syslog like format in a file that is collected by Filebeat and sent to Logstash input servers in a JSON format. On the Logstash indexers there is a drastic difference inbetween parsing out a text based (syslog like) log message and parsing out a well formatted JSON message such as from Winlogbeat, text based takes a lot of grok where as JSON based takes relatively little if none. By filtering the type of log on the Logstash input servers, the Windows firewall logs can be stripped out and put into the Redis buffer meant for text based messages to be later processed by the much more powerful Logstash indexers that are dedicated towards text based messages (…grok).