1

How can I configure filebeat to only ship a percentage of logs (a sample if you will) to logstash?

In my application's log folder the logs are chunked to about 20 megs each. I want filebeat to ship only about 1/300th of that log volume to logstash.

I need to pare down the log volume before I send it over the wire to logstash so I cannot do this filtering from logstash it needs to happen on the endpoint before it leaves the server.

I asked this question in the ES forum and someone said it was not possible with filebeat: https://discuss.elastic.co/t/ship-only-a-percentage-of-logs-to-logstash/77393/2

Is there really no way I can extend filebeat to do this? Can nxlog or another product to this?

red888
  • 27,709
  • 55
  • 204
  • 392
  • 1
    Yes, NXLog can do this and it's low footprint. You can filter and drop() based on various conditions (e.g. regex ). – b0ti Mar 11 '17 at 10:55
  • hmmm regex? Not sure if that's gonna work, I don't want to filter types of messages out I want to only send a percentage of the same type of messages. – red888 Mar 12 '17 at 15:15

2 Answers2

2

To the best of my knowledge, there is no way to do that with FileBeat. You can do it with Logstash, though.

filter {
  drop {
    percentage => 99.7
  }
}

This may be a use-case where you would use Logstash in shipping mode on the server, rather than FileBeat.

input {
  file {
    path => "/var/log/hugelogs/*.log"
    add_tags => [ 'sampled' ]
  }
}

filter {
  drop {
    percentage => 99.7
  }
}

output {
  tcp {
    host =>  'logstash.prod.internal'
    port =>  '3390'
  }
}

It means installing Logstash on your servers. However, you configure it as minimally as possible. Just an input, enough filters to get your desired effect, and a single output (Tcp in this case, but it could be anything). Full filtering will happen down the pipeline.

sysadmin1138
  • 1,263
  • 11
  • 11
  • I thought the log shipping version of logstash was deprecated. – red888 Mar 11 '17 at 01:22
  • 1
    @red888 It is, however Logstash itself can be configured to ship. It's not as low footprint as FIleBeat, but it will get the job done. It's how we do it. – sysadmin1138 Mar 11 '17 at 04:12
1

There's no way to configure Filebeat to drop arbitrary events based on a probability. But Filebeat does have the ability to drop events based on conditions. There are two way to filter events.

Filebeat has a way to specify lines to include or exclude when reading the file. This is the most efficient place to apply the filtering because it happens early. This is done using include_lines and exclude_lines in the config file.

filebeat.prospectors:
- paths:
  - /var/log/myapp/*.log
  exclude_lines: ['^DEBUG']

All Beats have "processors" that allow you to apply an action based on a condition. One action is drop_events and the conditions are regexp, contains, equals, and range.

processors:
- drop_event:
    when:
      regexp:
        message: '^DEBUG'
A J
  • 2,508
  • 21
  • 26
  • Well, I'm not trying to filter out specific lines, the lines that I want will be of a huge volume so I want to ship only a percentage. Could I do something with drop_event and a condition that would cause it to only send x % of logs over a period of time? – red888 Mar 12 '17 at 15:02
  • Could I: Set a variable with a timestamp. Then the condition would be send logs if current time < ( timestamp + 5mins ). If current time > ( timestamp + 5mins ) don't send logs until current time == (current time + 30mins). Then set the timestamp variable again and start over. Is something like this possible? – red888 Mar 12 '17 at 15:12