I'm using fluentd in docker (alpine image) to collect messages from gelf input. Running it using docker-compose
.
In the output, I need to send the messages to a 3rd party using a python SDK, and I need the output to be synchronous
, i.e. have only one output script running at a time.
so I wanted to use an exec
out plugin to run the python script.
<match logs.*>
@type exec
command python3 /src/output.py
format json
<buffer>
@type file
path /var/log/buffer
flush_interval 1s
</buffer>
</match>
Problem with that is that its only mode is asynchronous
- meaning it doesn't wait for the output to end before launching a new output, so when I have high throughput I have multiple outputs running.
Later on I found the exec_filter
which is used for filtering, but I figured I can use it to my scenario since its synchronous
. It does run only one process of output
and getting the events via stdin.
When I did some performance testing (with memory limit and disabling swap) - I got an OOM on the container (limiting it to 128mb).
The problem is - when it restarted after the OOM - it didn't run the output.py
script, and the chunks are failing since there is no output running. Docker logs:
That's the OOM I assume:
2023-04-14 10:45:22 +0000 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2023-04-14 10:45:24 +0000 chunk="5f949677e6035c6b397762e21f7ae37d" error_class=IOError error="stream closed in another thread"
2023-04-14 10:45:22 +0000 [warn]: #0 /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.16.0/lib/fluent/plugin/buffer/memory_chunk.rb:86:in `write'
2023-04-14 10:45:22 +0000 [warn]: #0 /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.16.0/lib/fluent/plugin/buffer/memory_chunk.rb:86:in `write_to'
2023-04-14 10:45:22 +0000 [warn]: #0 child process exits with error code code=9 status=nil signal=9
And then I get this constantly:
2023-04-14 10:45:23 +0000 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2023-04-14 10:45:26 +0000 chunk="5f949677e6035c6b397762e21f7ae37d" error_class=RuntimeError error="no healthy child processes exist"
2023-04-14 10:45:23 +0000 [warn]: #0 /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.16.0/lib/fluent/plugin/out_exec_filter.rb:282:in `write'
2023-04-14 10:45:23 +0000 [warn]: #0 /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.16.0/lib/fluent/plugin/output.rb:1225:in `try_flush'
2023-04-14 10:45:23 +0000 [warn]: #0 /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.16.0/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
2023-04-14 10:45:23 +0000 [warn]: #0 /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.16.0/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
2023-04-14 10:45:23 +0000 [warn]: #0 /usr/lib/ruby/gems/3.1.0/gems/fluentd-1.16.0/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
I didn't find any more logs to help me investigate.
So 2 questions:
For my scenario (
synchronous
output), is there a better fit then to use theexec_filter
?How can I further investigate the
output.py
not running?