0

I have a table with approx 8 million rows that I need time slice data from.
I'm using PostgreSQL 9.1.

I need to query this table for a sum of the max(start_time) associated 'data' values for each 'object_id' (given a list), on a per day basis. (In other words, the closest record to the end of each day, for each object_id in a specific list).

Here's the basic table structure:

CREATE TABLE checks (
  id SERIAL PRIMARY KEY,
  object_id INTEGER,
  state INTEGER,
  start_time TIMESTAMP,
  data TEXT
);

data is a TEXT field, but has numeric values (I can't change this aspect, but can convert using casts).

And here's the query I'm working with so far:

WITH object_ids AS ( 
    SELECT object_id FROM objects WHERE object_id in (14845,12504,12451,12452)
),
records AS (
    SELECT
        data,
        start_time,
        MAX(start_time) OVER (PARTITION BY object_id)
    FROM checks
    WHERE
        object_id IN (SELECT object_id FROM object_ids) AND
        state = 0 AND
        start_time BETWEEN '2013-05-01 00:00:00' AND '2013-05-02 00:00:00'
)   
SELECT 
    SUM(data::bigint) 
FROM   
    records
WHERE 
    max = start_time

I'll run this query for each day of the month to give me a set of chart data points.

I would love to modify this query so I don't have to run separate queries per day, but one query to return a set of per day values

start_time          | sum
---------------------------
2013-05-01 00:00:00 | 39118
2013-05-02 00:00:00 | 98387
2013-05-03 00:00:00 | 8384

I've been researching time slice questions, and they are very helpful (I owe the fact that I'm using windowing functions to StackOverflow!), but I just can't make the leap to solving this problem.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
krunchyklown
  • 45
  • 1
  • 8

2 Answers2

1
SELECT day, sum(data) AS total_per_day
FROM  (
   SELECT DISTINCT ON (object_id, 1)
          start_time::date, data::numeric
   FROM   checks c
   WHERE  object_id in (14845,12504,12451,12452)
   AND    state = 0
   AND    start_time >= '2013-04-01'::date
   AND    start_time <  '2013-05-05'::date   -- any range of days
   ORDER  BY object_id, 1, c.start_time DESC -- seems redundant, see text
   ) x
GROUP BY  1
ORDER BY  1

This gives you one row day with a sum. I realize it's much like what @Clodoaldo already posted, but I demonstrate correct DISTINCT ON syntax, plus some other improvements and some necessary explanation.

  • You can use DISTINCT ON to the the data-value per day. Should be simpler and faster than a window function:

  • The ORDER BY clause has to agree with the DISTINCT ON expressions (details in the linked answer). That's the reason for:

    ORDER  BY 1, start_time::date, start_time DESC
    

    The second item would seem redundand, but is needed here.

  • The get the date from a timestamp, just cast: start_time::date.

  • Careful when you include start_time::date (which I optimized out) in the SELECT list. In GROUP BY and ORDER BY output columns take precedence over input columns (as opposed to WHERE and HAVING where you can only reference input columns). You have to use a different alias for the output column or table-qualify the base column to refer to it: c.start_time

  • With timestamp you practically always will want to exclude the upper border. Details in this related answer:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0
select
    "day", sum(data) "data"
from (
    select distinct (1, 2)
        object_id,
        date_trunc('day', start_time)::date "day",
        start_time,
        "data"
    from checks
    where
        object_id in (14845,12504,12451,12452)
        and state = 0
    order by 1, 2, 3 desc
) s
group by 1
order by 1
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260