I'm on a project that involves time series analytics, and I need to be able to let users upload a file containing their own time series (ie numbers with dates), for instance in a .csv file. Data contained in their files would then be accessible at any time, to be used within our system.
How could I do that? The ideas I've thought about:
- Create a table each time a user upload a file (and save somewhere the name of that table). If I have lots of users uploading lots of data, I may end up with tons of tables.
- Create one big fat monster table with basically three or four columns: the date of the value; the value; the dataset name (and/or the dataset's owner). Everything is uploaded in that table, and when Bob needs its weather data I just select (date,value) where owner = Bob and datasetname = weatherdata.
- In between solution: one table per user, and all Bob's datasets are in Bob's table.
- Completely different: just save the .csv file somewhere and read it when you need it.
I keep reading it's bad practice to have a varying number of tables (and I believe it). However my situation is slightly different from other questions I've seen on this site (most people seems to want to create one table per user, when they should create one row per user).
Some additional information:
- time series data may contain hundreds of thousands observations, maybe millions
- a priori, saved data should not be modified afterwards. However I guess it would be useful to let users append new data to their time series.
- a priori, I won't need to do complicated SQL select statements. I just want to read Bob's weather data and I'll probably use it in the chronological order - although you never know what tomorrow may bring.
- using PostgreSQL 9.1, if that's of any importance.
EDIT Reading some answers I realize I may have not done my job very well, I should have said that I'm clearly already evolving in a SQL environment; I already have a User table; when I write "table" I really mean "relation"; all my 4 ideas involve foreign keys somewhere; and RDBMS normalization is the paradigm unless something else is better. (All this not meaning I'm against not-sql solutions).