Automatic log analysis and alert generation

Question

I would like some design advice for a centralized logging project I am considering. I have a number of components producing logs on various servers. Apache Flume looks like the sensible choice for streaming to a central log server, most likely into an elasticsearch instance for querying and analysis.

Here's my question: I would like to provide a scripting engine listening to the flow of log events arriving on the central server. Would it make sense to do that as an interceptor in Flume, or as a plugin to elasticsearch, or something else completely?

I'm not that familiar with Flume, but have you considered Logstash, Elasticsearch and Kibana? I believe that Reimann can be tied into the mix as well for event awareness/flaggin/monitoring purposes. See http://logstash.net/docs/1.1.13/outputs/riemann, http://three.kibana.org/ — James Addison, Aug 27 '13 at 05:29
Logstash and Flume fit in the same space actually, but coming from a Java background I'm more comfortable with the Apache tools. Elasticsearch and kibana3 are definitely in the mix. What's missing is the event processing component, so thanks a lot for the Riemann suggestion.. it looks like the sort of thing I need, only in Flume world. — MarkNS, Aug 27 '13 at 06:53

score 2 · Accepted Answer · answered Aug 28 '13 at 15:53

flume provides the pipleline to Hadoop/HBase originally, and it allows you to do pretty much all sort of decorating, transforming and intercepting before it reaches the final storage. So flume is a perfect place to have the pre-processing (alerting in your case). The flume sink can be Elastic Search, which means the logs will be eventually ended up in Elastic Search. To answer your question, before the logs gets into the final destination, it makes perfect sense to have all your alerting/alarm/notifications triggered in the pipeline, both old flume and flume-ng architecture are customisable and powerful in this regard.

Another thing to mention is that, Elastic Search is perfect for full-text-search, but analytics, it can't compete against Hadoop ecosystem. Cloudera CDH4.3 added Solr cloud into Hadoop, this gives a plus to the combination: flume + HDFS or HBase + Solr. It's worthwhile looking at this mix as well.

Automatic log analysis and alert generation

1 Answers1