I am performing large scale wind simulations to produce hourly wind patterns over a city. The results is a time series of 2-dimensional contours. Currently I am storing the results in SQLite3
database tables with the following structure
Table: CFD
id, timestamp, velocity, cell_id
1 , 2010-01-01 08:00:00, 3.345, 1
2 , 2010-01-01 08:00:00, 2.355, 2
3 , 2010-01-01 08:00:00, 2.111, 3
4 , 2010-01-01 08:00:00, 6.432, 4
.., ..................., ....., .
1000 , 2010-01-01 09:00:00, 3.345, 1
1001 , 2010-01-01 10:00:00, 2.355, 2
1002 , 2010-01-01 11:00:00, 2.111, 3
1003 , 2010-01-01 12:00:00, 6.432, 4
.., ..................., ....., .
Actual create statement:
CREATE TABLE cfd(id INTEGER PRIMARY KEY, time DATETIME, u, cell_id integer)
CREATE INDEX idx_cell_id_cfd on cfd(cell_id)
CREATE INDEX idx_time_cfd on cfd(time)
(There are three of these tables, each for a different result variable)
where cell_id
is a reference to the cell in the domain representing a location in the city. See this picture to have an idea of what it looks like at a specific timestep.
The typical query performs some kind of aggregation on the time dimension and group by on cell_id
. For example, if I want to know the average local wind speed in each cell during a specific time interval, I would execute
select sum(time in ('2010-01-01 08:00:00','2010-01-01 13:00:00','2010-01-01 14:00:00', ...................., ,'2010-12-30 18:00:00','2010-12-30 19:00:00','2010-12-30 20:00:00','2010-12-30 21:00:00') and u > 5.0) from cfd group by cell_id
The number of timestamps can vary from 100 to 8,000.
This is fine for small databases, but it gets much slower for larger ones. For example, my last database was 60GB, 3 tables and each table had 222,000,000 rows.
Is there a better way to store the data? For example:
- would it make sense to create a different table for each day?
- would be better to use a separate table for the timesteps and then use a join?
- is there a better way of indexing?
I have already adopted all the recommendations in this question to maximise the performance.