I need to store a bunch of time series in a database but I'm concerned about both the size and the processing time.
To reduce size I've already used, in another project, zipped/JSON to store a whole time series, and this is quite efficient in term of storage space. But the issue is that to search some data you have to first retrieve the whole time series, unzip it and unserialize it and of course you can't use the database integrated querying capabilities like SQL SELECT/WHERE.
So you consume bandwidth to get the data, CPU to unzip, RAM to store even if you need only one point...
This was not an issue for the previous project because time series were always manipulated as a whole, essentially to be displayed in charts or Excel, but this time I'd like to have a minimal ability to search data in database.
To allow this flexibility in term of data manipulation, e.g. using SQL, there is the "standard format": one row by date, but I have two concerns:
- a time series over 10 years could have 3000 values so it means 3000 rows, so imagine if I have 1M time series I could have 3G rows! I'm not sure a "normal" database like MySQL or PostgreSQL could handle such a huge numbers of rows but hope I'm wrong
- I don't know if DBMS are so good at optimizing the space required by all the cells, though while it's not "too" big it's OK
I can choose any free database, so NoSQL is welcome too if it can help.
Have you any suggestions, or better some feedbacks?
Thanks for any input.