Efficiently Subsample Timeseries Data in Database

Question

I have a timeseries of values (e.g., a random walk of stock market prices) stored in a PostgreSQL database. It's a very large table and I'd like to be able to handle queries for arbitrary time spans similarly fast. I have this chart in the back of my mind, and I wonder how they did this.

A simple Example:

WITH t(ts, val) AS ( VALUES
    ('2012-10-04 00:00:00'::timestamp, 1.11::numeric),
    ('2012-10-04 00:00:01', 1.21),
    ('2012-10-04 00:00:02', 1.25),
    ('2012-10-04 00:00:03', 1.41),
    ('2012-10-04 00:00:04', 1.31),
    ('2012-10-04 00:00:05', 1.25),
    ('2012-10-04 00:00:06', 1.33))

(Assume there's an index on the timestamp column.) The table is large, and it takes a long time to retrieve all values of a time span of, e.g., a quarter of a year. However, as all I want to do with that data is to make a plot to visualize the global trend, I do not really need to get the entire data set from that period, but a representative subset would be fine.

Things that came to my mind:

generate a list of sub-statements, each of which retrieves one arbitrary value for a short sub-time-interval (e.g. one value per hour interval).
aggregate values, e.g. AVG() and group by date_trunc('hour', ts) or similar (but would this be any faster on its own? Probably make another table that holds pre-aggregated values?)

Is there a way-to-go to achieve this?

Please provide more info about your table. Best a complete CREATE script with relevant columns. Primary key and unique columns are relevant. Do you have many / big gaps in your ID space? — Erwin Brandstetter, Oct 15 '12 at 17:06
@ErwinBrandstetter at this point the table is not actually implemented. It's somewhat in a conceptual state... — moooeeeep, Oct 16 '12 at 12:05

score 1 · Accepted Answer · edited May 23 '17 at 12:04

1

My first impulse would be to create a materialized view with aggregated data. This should be very fast (not counting the one-time operation to create it.)

Barring that, if you don't want to create more objects in your database, (truly) random selection combined with an index might be fast and valid enough.

Depending on the specifics and the actual size of your table and the requirements as to how exact your result has to be, you might be able to pull something off along these lines, which could be comparatively fast.

edited May 23 '17 at 12:04

Community

1
1

answered Oct 15 '12 at 17:12

Erwin Brandstetter

605,456
145
1,078
1,228

Many thanks for your input so far. Also the link you've supplied was already very helpful. – moooeeeep Oct 16 '12 at 12:06

Efficiently Subsample Timeseries Data in Database

1 Answers1