I have a timeseries of values (e.g., a random walk of stock market prices) stored in a PostgreSQL database. It's a very large table and I'd like to be able to handle queries for arbitrary time spans similarly fast. I have this chart in the back of my mind, and I wonder how they did this.
A simple Example:
WITH t(ts, val) AS ( VALUES
('2012-10-04 00:00:00'::timestamp, 1.11::numeric),
('2012-10-04 00:00:01', 1.21),
('2012-10-04 00:00:02', 1.25),
('2012-10-04 00:00:03', 1.41),
('2012-10-04 00:00:04', 1.31),
('2012-10-04 00:00:05', 1.25),
('2012-10-04 00:00:06', 1.33))
(Assume there's an index on the timestamp column.) The table is large, and it takes a long time to retrieve all values of a time span of, e.g., a quarter of a year. However, as all I want to do with that data is to make a plot to visualize the global trend, I do not really need to get the entire data set from that period, but a representative subset would be fine.
Things that came to my mind:
- generate a list of sub-statements, each of which retrieves one arbitrary value for a short sub-time-interval (e.g. one value per hour interval).
- aggregate values, e.g.
AVG()
and group bydate_trunc('hour', ts)
or similar (but would this be any faster on its own? Probably make another table that holds pre-aggregated values?)
Is there a way-to-go to achieve this?