7

I wish to export from multiple nodes log files (in my case apache access and error logs) and aggregate that data in batch, as a scheduled job. I have seen multiple solutions that work with streaming data (i.e think scribe). I would like a tool that gives me the flexibility to define the destination. This requirement comes from the fact that I want to use HDFS as the destination.

I have not been able to find a tool that supports this in batch. Before re-creating the wheel I wanted to ask the StackOverflow community for their input.

If a solution exists already in python that would be even better.

Mohan Gulati
  • 28,219
  • 5
  • 22
  • 20

4 Answers4

1

we use http://mergelog.sourceforge.net/ to merge all our apache logs..

Doon
  • 19,719
  • 3
  • 40
  • 44
0

take a look at Zomhg, its an aggregation/reporting system for log files using Hbase and Hdfs: http://github.com/zohmg/zohmg

alex
  • 1,757
  • 4
  • 21
  • 32
0

Scribe can meet your requirements, there's a version (link) of scribe that can aggregate logs from multiple sources, and after reaching given threshold it stores everything in HDFS. I've used it and it works very well. Compilation is quite complicated, so if you have any problems ask a question.

wlk
  • 5,695
  • 6
  • 54
  • 72
-1

PiCloud may help.

The PiCloud Platform gives you the freedom to develop your algorithms and software without sinking time into all of the plumbing that comes with provisioning, managing, and maintaining servers.

Jacob Schoen
  • 14,034
  • 15
  • 82
  • 102
UsAaR33
  • 3,536
  • 2
  • 34
  • 55