0

Hello Everyone, I have been trying to this from a week or so. But not figure out a way. I work with tomcats and my client regularly send me log files in 2 to 3GB's stating there was a issue of file not found etc.. Some times they dont have the proper information to grep through the log files. SO I decided to build a tool that can parse all log files and can categorize the logs accordingly. Now i cannot store 4 GB of data in memory and I cannot put it back into the file because reading 4GB will take a lot of time. Even though I am using file channels and threads. Data Base is certainly not a option since it will again slow the system down. So I want to know is there is any other way to store the parsed contents so that whenever i want to check 404 error i must get all 404 errors in a list.

I do not wish to use a database. So database is certainly not a answer for this.

Andremoniy
  • 34,031
  • 20
  • 135
  • 241
user2071270
  • 172
  • 1
  • 9
  • 1
    Why not just reading and writing in two streams simultaneously? – Andremoniy Feb 14 '13 at 08:52
  • 1
    it takes 20 minutes to read the data and parse. If i do streaming simultaneously then it will slow the system down plus. For each search i have load and parse again – user2071270 Feb 14 '13 at 08:58
  • Nop. You do not need to load entire file into memory. Just read it line-by-line, and when found needed info, write it to another file immediately. – Andremoniy Feb 14 '13 at 09:00
  • BTW, have you tried `grep` utility? – Andremoniy Feb 14 '13 at 09:01
  • Reading line by line will take a lot of time. Now my client tells only the the time for example some where around 12 so i have to parse the log files from 11:30 to 12:30 to list all the errors. Sometimes he only tells the alert he gets. – user2071270 Feb 14 '13 at 09:05
  • Why cant you use already existing tool like [Lambda Probe](http://code.google.com/p/psi-probe/) – Jayamohan Feb 14 '13 at 09:12

2 Answers2

2

It doesn't matter whether "You want to use a database" or not. What you're doing is essentially building a graph of data. This is what databases are designed for. Now you can choose to use one that someone else wrote, and is widely tested, or you can choose to roll your own. Either way you're using a database, whether you want to or not.

If you want a lightweight, embeddable, well performing, document/graph "No SQL" database that works well with Maven, OrientDB is your friend, and using it is very intuitive. Plus you can choose whether you want to use an in-memory database, a file backed database, or a more traditional client/server solution, depending on your needs. Best part is that it has an Object abstraction layer, so you don't even have to mess about with an ORM framework.

You really should try it. It'll make all your pains go away.

Linky: http://www.orientdb.org/

Mikkel Løkke
  • 3,710
  • 23
  • 37
0

You can use Apache Lucene. Use nio file handlers for divining the file into chucks and use Apache Lucene for indexing and text searching. This might not solve your complete problem but is a better solution if you do not wish to use data base.

StackzOfZtuff
  • 2,534
  • 1
  • 28
  • 25
Maclean Pinto
  • 1,075
  • 2
  • 17
  • 39