There is a need to rate limit the event rate to an output in Logstash, how do you do it?  Perhaps you are outputting to some other system that is licensed based on a certain event rate per second, or perhaps it is a mechanism to protect Elasticsearch by preventing large spikes of ingested logs from being dumped at once into Elasticsearch.  Elastic has recommended a combination of the throttle filter plugin and the sleep filter plugin and while that does work, it is not accurate and is a very roundabout way to do it.

Lets explore the two examples from above.  Logstash is outputting to another application that is limited to an exact rate limit due to licensing.  A lot of SIEM’s are licensed this way as well as a handful of other applications.  The daily average event rate is under the licensed limit but short spikes in log ingestion event rate easily go over the licensed limit.  While most SIEM’s will buffer this internally and process later, the important distinction is that if the Logstash output spikes above the SIEM licensed limit….the SIEM is buffering ALL logs, not just the logs from Logstash.

The second example would be a Logstash pipeline that implements a buffer between the ingestion stage and the indexing stage.  Logs are received and written to a buffer such as Kafka or Redis by one set of Logstash servers, and then a second set of Logstash servers read from the buffer, parse out the logs, and write them to Elasticsearch.  This is a very common and popular design because it decouples the ingestion event rate from the rest of the Logstash pipeline.  However, there is one design feature that can also be considered a drawback.  The Logstash servers that parse and index the data are often times faster and more powerful then the Logstash servers that ingest and buffer the logs so that if logs are queued up in the buffer, they can flush it.  The average event rate at ingestion might be 10,000 per second but the Logstash servers that do the indexing are capable of processing 100,000 logs per second.  If left to run unrestricted these large spikes when flushing logs from the queue can have a dramatic performance impact in Elasticsearch.

 

Below is a simple Ruby script that will rate limit to a preconfigured amount.  This *must* only be used where the pipeline can exert backpressure back to the input plugin and the ideal situation is to use when reading from a buffer/queue.  If not, logs will overflow the input plugin queue and be dropped. Here is a sample of the accuracy of the rate limiting, there is some variability as this is not meant to be a high precision rate limiter:

 

 

Below is the Ruby code.  Here are the configurable variables:
—  @ratelimit = the events per second
—  @window = the window size in seconds to track.  1 works well for most cases
—  @sleeptime = the amount of time in seconds to sleep before checking it is time to emit another log.  0.1 seconds works well
—  @metricflushtime = the interval in seconds between flushing metrics in a new event

 

#####################
## Author: packetrevolt.com
#####################
## simple rate limiter
ruby {
  init => "
    # ratelimiter config settings
    @ratelimit = 1000
    @window = 1
    @sleeptime = 0.1
    @metricflushtime = 1

    # initialize variables
    @bucket = Array.new
  "
  code => "
    # Leaky bucket
    loop do
      if @bucket.count < (@ratelimit * @window)
        @bucket.insert(0, Time.now.to_i)
        break
      else
        if Time.now.to_i - @bucket.last.to_i < @window
          sleep @sleeptime
        else
          @bucket.pop
          @bucket.insert(0, Time.now.to_i)
          break
        end
      end
    end
  "  
}
#####################

This Post Has 2 Comments

  1. It seems @metricflushtime isn’t used?

    1. Correct. Good catch. The script above was simplified from a more complex rate limiter with more features, @metricflushtime variable is used in that more complex rate limiter but not this one. I’ll create a new blog post documenting the more complex rate limiter in the future.

Leave a Reply

Close Menu