0

I have log files (Log4j) downloaded from a remote machine. My requirement is to store the contents of these files to mongodb. I am choosing mongodb as it is schema less data store and hence ideal for storing log data. My query is how do I convert log data to json documents and store it in mongodb? I can split each log entry based on regex and store it in an object and then persist it to mongo, but I have a gut feeling there is a better way out to do it.The log files I am talking about in here are at max 50mb in size. This how a sanitized log file is going to look like:

2014-01-11T17:18:52.656260-08:00 localhost local0: Localhost 17:18:52.655 [INFO ] [..... | Timer-1 ] - asldknluenfbayewbfayewbdaiybdaiywbayhwbdsaas
2014-01-11T13:18:52.657649-08:00 localhost local0: Localhost 17:18:52.657 [INFO ] [..... | Timer-1 ] - dasdasldukjbfksbdfkajsnbdkasaasdasdasdasdasd
2014-01-11T13:18:52.659029-08:00 localhost local0: Localhost 17:18:52.658 [WARN ] [..... | Timer-1 ] - fjdshfaushdaksbdkasudhaksudbaksdbaksdasdasd
2014-01-11T56:18:52.661312-08:00 localhost local0: Localhost 17:18:52.660 [INFO ] [..... | Timer-1 ] - java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:365) at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:376) at java.util.AbstractCollection.toArray(AbstractCollection.java:126) at java.util.ArrayList.addAll(ArrayList.java:473) at a.b.c.etc.SomeWrapper.rebuild(SomeWraper.java:109) at a.b.c.etc.SomeCaller.updateCache(SomeCaller.java:421)
2014-01-11T17:18:52.661751-08:00 localhost local0: Localhost 17:18:52.661 [FATAL] [..... | Timer-1 ] - sdfsdfsdfsdfsdfsdfsdfsasdasdasdasdasdasdasd
2014-01-11T17:18:52.663283-08:00 localhost local0: Localhost 17:18:52.662 [ERROR] [..... | Timer-1 ] - sdasdasdasdas

I would be querying for data based on date range and log level, hence these are the two fields I would want along with a field to hold data. Any thoughts/help would be appreciated.

vmr
  • 1,895
  • 13
  • 24
  • Partial duplicate of [Parse a log4j log file](http://stackoverflow.com/questions/2327073). I think you're looking for a library or template to parse the `log4j` output (after which it should be straightforward to save the parsed results into MongoDB). Your comment about querying for data suggests a second question on schema design and indexing. I think you'd probably want to post that separately with example of the schema approach(es) that you are considering, common queries, and whether you have defined suitable indexes. – Stennie Jan 22 '14 at 14:09

1 Answers1

1

I am not aware of some json-appender. But maybe logstash is your friend. It will watch your logdir and pump the files into a mongodb (see http://logstash.net/docs/1.3.2/outputs/mongodb).

TobiSH
  • 2,833
  • 3
  • 23
  • 33