I'm pushing my logs to a local splunk installation. Recently I found that the following error repeats a lot (about once every minute):
Error L10 (output buffer overflow): 7150 messages dropped since 2013-06-26T19:19:52+00:00.134 <13>1 2013-07-08T14:59:47.162084+00:00 host app web.1 - [\x1B[37minfo\x1B[0m] application - Perf - it took 31 milliseconds to fetch row IDs ...
The errors repeat quite a lot, and in the documentation it is said that these errors happen when your application produces a lot of logs.
Thing is, I barely have 20-30 logs per second, which isn't really considered a lot. I tested with other drains (added the built-in papertrail plugin), and these errors do not happen there - so they are specific to the outgoing splunk drain.
I thought maybe the splunk machine was loaded and thus not accepting logs fast enough, but its CPU is idle, and it has plenty of disk & memory.
Also, I believe the app (Play 2 app) is auto-flushing logs to console all the time, so there is no big buildup of unflushed logs followed by a release.
What can cause a slow drain speed for the outgoing splunk drain? How should I debug it?