I have a table with approx 8 million rows that I need time slice data from.
I'm using PostgreSQL 9.1.
I need to query this table for a sum of the max(start_time) associated 'data' values for each 'object_id' (given a list), on a per day basis. (In other words, the closest record to the end of each day, for each object_id in a specific list).
Here's the basic table structure:
CREATE TABLE checks (
id SERIAL PRIMARY KEY,
object_id INTEGER,
state INTEGER,
start_time TIMESTAMP,
data TEXT
);
data
is a TEXT
field, but has numeric
values (I can't change this aspect, but can convert using casts).
And here's the query I'm working with so far:
WITH object_ids AS (
SELECT object_id FROM objects WHERE object_id in (14845,12504,12451,12452)
),
records AS (
SELECT
data,
start_time,
MAX(start_time) OVER (PARTITION BY object_id)
FROM checks
WHERE
object_id IN (SELECT object_id FROM object_ids) AND
state = 0 AND
start_time BETWEEN '2013-05-01 00:00:00' AND '2013-05-02 00:00:00'
)
SELECT
SUM(data::bigint)
FROM
records
WHERE
max = start_time
I'll run this query for each day of the month to give me a set of chart data points.
I would love to modify this query so I don't have to run separate queries per day, but one query to return a set of per day values
start_time | sum
---------------------------
2013-05-01 00:00:00 | 39118
2013-05-02 00:00:00 | 98387
2013-05-03 00:00:00 | 8384
I've been researching time slice questions, and they are very helpful (I owe the fact that I'm using windowing functions to StackOverflow!), but I just can't make the leap to solving this problem.