I have multiple threads in my app generating log files based on the work it is performing. They run typically multiple iterations over multiple days and generate close to 15 - 20 GB of data. I extract specific fields from each of those iterations of logs and store them along with the log.
I need to perform data analysis on these fields and may extract more data from the raw log in the future. I am finding myself writing more code to manage these files, doing analysis like summation, averaging, min, max etc and generate reports based on that. Also writing code to make sure the data generated from the threads are properly stored in files. Is it possible to abstract away some of these problems with use of appropriate database?
Is there a database which would meet the following requirement
Document based
Allows me to do data analysis like summation, min, max, average, consolidation based on specific fields etc.
- Allows extraction of new data from the log files.
- I don't have any high performance writes or reads as you can see that it takes days to generate 20 GB worth of data.
- I might be running multiple such application in parallel and they would be accessing the same database.
- I would like to do joins also.
- I am working on C#/.NET
I came across RethinkDB which looked like the solution I wanted, but turns out it is still not production ready and supported only on Linux.
Thanks...