Logging from Java app to ELK without need for parsing logs

Question

I want to send logs from a Java app to ElasticSearch, and the conventional approach seems to be to set up Logstash on the server running the app, and have logstash parse the log files (with regex...!) and load them into ElasticSearch.

Is there a reason it's done this way, rather than just setting up log4J (or logback) to log things in the desired format directly into a log collector that can then be shipped to ElasticSearch asynchronously? It seems crazy to me to have to fiddle with grok filters to deal with multiline stack traces (and burn CPU cycles on log parsing) when the app itself could just log it the desired format in the first place?

On a tangentially related note, for apps running in a Docker container, is best practice to log directly to ElasticSearch, given the need to run only one process?

Even if you send a nice json document straight to elasticsearch, there still can be business intelligence that should be applied on the way by. That's a great use for logstash. Also, most people don't live in a heterogeneous world, so using one aggregator can be powerful. tmtowtdi, for sure. — Alain Collins, Aug 31 '15 at 00:22
I feel this is mainly because of scalability reasons. If the application is pushing logs to Elasticsearch , the back pressure due to slowness from ELasticsearch can affect the application performance and if the application is queuing a lot of logs in the main memory , it will certainly have an adverse affect. — Vineeth Mohan, Aug 31 '15 at 04:57

Val · Answer 1 · 2015-08-31T05:58:58.040

9

If you really want to go down that path, the idea would be to use something like an Elasticsearch appender (or this one or this other one) which would ship your logs directly to your ES cluster.

However, I'd advise against it for the same reasons mentioned by @Vineeth Mohan. You'd also need to ask yourself a couple questions, but mainly what would happen if your ES cluster goes down for any reason (OOM, network down, ES upgrade, etc)?

There are many reasons why asynchronicity exists, one of which is robustness of your architecture and most of the time that's much more important than burning a few more CPU cycles on log parsing.

Also note that there is an ongoing discussion about this very subject going on in the official ES discussion forum.

edited Aug 31 '15 at 05:58

answered Aug 31 '15 at 05:53

Val

207,596
13
358
360

Emitting ambiguous text logs from structured data and parsing it again is unnecessary complication. It's not about CPU cycles it's about robustness of data. It's a shame to carefully extract stack traces when they originally are structured... And I don't understand why you are warring about ES cluster (especially if you configure redundancy with replication). It's much probably to see Logstash/Flume or even Kafka/Redis dead than ES... – gavenkoa Sep 26 '17 at 22:10
@gavenkoa I don't know your context and your mileage may vary. Of course, on a single development or staging node, that doesn't make sense, but experience has shown that having this asynchronous pipeline provides much more robustness in real production settings for a multitude of reasons. Feel free to create a question with your detailed use case(s) and we can talk about it. – Val Sep 27 '17 at 03:11

score 2 · Accepted Answer · answered Aug 31 '15 at 06:05

I think it's usually ill-advised to log directly to Elasticsearch from a Log4j/Logback/whatever appender, but I agree that writing Logstash filters to parse a "normal" human-readable Java log is a bad idea too. I use https://github.com/logstash/log4j-jsonevent-layout everywhere I can to have Log4j's regular file appenders produce JSON logs that don't require any further parsing by Logstash.

score 2 · Answer 3 · answered Sep 26 '19 at 16:07

2

There is also https://github.com/elastic/java-ecs-logging which provides a layout for log4j, log4j2 and Logback. It's quite efficient and the Filebeat configuration is very minimal.

Disclaimer: I'm the author of this library.

answered Sep 26 '19 at 16:07

Felix

5,804
4
25
37

score 0 · Answer 4 · answered Oct 15 '17 at 00:15

If you need a quick solution I've written this appender here Log4J2 Elastic REST Appender if you want to use it. It has the ability to buffer log events based on time and/or number of events before sending it to Elastic (using the _bulk API so that it sends it all in one go). It has been published to Maven Central so it's pretty straight forward.

As the other folks have already mentioned the best way to do it would be to save it to file, and then ship it to ES separately. However I think that there is value if you need to get something running quickly until you have time/resources implement the optimal way.

Logging from Java app to ELK without need for parsing logs

4 Answers4

Linked