I have log files (Log4j) downloaded from a remote machine. My requirement is to store the contents of these files to mongodb. I am choosing mongodb as it is schema less data store and hence ideal for storing log data. My query is how do I convert log data to json documents and store it in mongodb? I can split each log entry based on regex and store it in an object and then persist it to mongo, but I have a gut feeling there is a better way out to do it.The log files I am talking about in here are at max 50mb in size. This how a sanitized log file is going to look like:
2014-01-11T17:18:52.656260-08:00 localhost local0: Localhost 17:18:52.655 [INFO ] [..... | Timer-1 ] - asldknluenfbayewbfayewbdaiybdaiywbayhwbdsaas
2014-01-11T13:18:52.657649-08:00 localhost local0: Localhost 17:18:52.657 [INFO ] [..... | Timer-1 ] - dasdasldukjbfksbdfkajsnbdkasaasdasdasdasdasd
2014-01-11T13:18:52.659029-08:00 localhost local0: Localhost 17:18:52.658 [WARN ] [..... | Timer-1 ] - fjdshfaushdaksbdkasudhaksudbaksdbaksdasdasd
2014-01-11T56:18:52.661312-08:00 localhost local0: Localhost 17:18:52.660 [INFO ] [..... | Timer-1 ] - java.util.ConcurrentModificationException
at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:365)
at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:376)
at java.util.AbstractCollection.toArray(AbstractCollection.java:126)
at java.util.ArrayList.addAll(ArrayList.java:473)
at a.b.c.etc.SomeWrapper.rebuild(SomeWraper.java:109)
at a.b.c.etc.SomeCaller.updateCache(SomeCaller.java:421)
2014-01-11T17:18:52.661751-08:00 localhost local0: Localhost 17:18:52.661 [FATAL] [..... | Timer-1 ] - sdfsdfsdfsdfsdfsdfsdfsasdasdasdasdasdasdasd
2014-01-11T17:18:52.663283-08:00 localhost local0: Localhost 17:18:52.662 [ERROR] [..... | Timer-1 ] - sdasdasdasdas
I would be querying for data based on date range and log level, hence these are the two fields I would want along with a field to hold data. Any thoughts/help would be appreciated.