1

In a postgres table I have store the speed of an object with a 10 seconds interval. The values are not available for every 10 seconds during the day; so it could be that there is no line for today 16:39:40

How would the query look like to get an relation containing the average of the speed for 1 minute (or 30sec or n-sec) intervals for a given day, assuming the non-existing rows mean a speed of 0.

speed_table
id (int, pk) ts (timestamp) speed (numeric)

I've built this query but am getting stuck on some important parts:

SELECT 
    date_trunc('minute', ts) AS truncated, 
    avg(speed)
FROM speed_table AS t 
WHERE ts >= '2014-06-21 00:00:00'
AND ts <= '2014-06-21 23:59:59'
AND condition2 = 'something'
GROUP BY date_trunc('minute', ts)
ORDER BY truncated
  • How can I alter the interval in something other then the result of the date_trunc function eg 5 minutes of 30 seconds?
  • How can I add the not available rows for the remaining of the day?
stUrb
  • 6,612
  • 8
  • 43
  • 71

2 Answers2

3

Simple and fast solution for this particular example:

SELECT date_trunc('minute', ts) AS minute
     , sum(speed)/6 AS avg_speed
FROM   speed_table AS t 
WHERE  ts >= '2014-06-21 0:0'
AND    ts <  '2014-06-20 0:0'  -- exclude dangling corner case
AND    condition2 = 'something'
GROUP  BY 1
ORDER  BY 1;

You need to factor in missing rows as "0 speed". Since a minute has 6 samples, just sum and divide by 6. Missing rows evaluate to 0 implicitly.

This returns no row for minutes with no rows at all.avg_speed for missing result rows is 0.

General query for arbitrary intervals

Works for all any interval listed in the manual for date_trunc():

SELECT date_trunc('minute', g.ts) AS ts_start
     , avg(COALESCE(speed, 0))    AS avg_speed
FROM  (SELECT generate_series('2014-06-21 0:0'::timestamp
                            , '2014-06-22 0:0'::timestamp
                            , '10 sec'::interval) AS ts) g
LEFT   JOIN speed_table t USING (ts)
WHERE (t.condition2 = 'something' OR
       t.condition2 IS NULL)                -- depends on actual condition!
AND    g.ts <> '2014-06-22 0:0'::timestamp  -- exclude dangling corner case
GROUP  BY 1
ORDER  BY 1;

The problematic part is the additional unknown condition. You would need to define that. And decide whether missing rows supplied by generate_series should pass the test or not (which can be tricky!).
I let them pass in my example (and all other rows with a NULL values).

Compare:
PostgreSQL: running count of rows for a query 'by minute'

Arbitrary intervals:
Truncate timestamp to arbitrary intervals

For completely arbitrary intervals consider @Clodoaldo's math based on epoch values or use the often overlooked function width_bucket(). Example:
Aggregating (x,y) coordinate point clouds in PostgreSQL Aggregating (x,y) coordinate point clouds in PostgreSQL

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • The unknown condition is just a integer selector: eg `where t.objectid= 56` – stUrb Jun 21 '14 at 16:19
  • @stUrb: The tricky parts: Can `t.objectid` be NULL? Do we assume missing rows to fulfill the condition? What's the difference between a row that is removed by the condition and a row that is just not there? – Erwin Brandstetter Jun 21 '14 at 16:21
  • The object id cannot be NULL. The tables stores the speed of multiple assets. The object_id just separates the different assests. – stUrb Jun 21 '14 at 16:59
2

If you had issued some data it would be possible to test so this can contain errors. Point them including the error message so I can fix.

select
    to_timestamp(
        (extract(epoch from ts)::integer / (60 * 2)) * (60 * 2)
    ) as truncated,
    avg(coalesce(speed, 0)) as avg_speed
from
    generate_series (
        '2014-06-21 00:00:00'::timestamp,
        '2014-06-22'::timestamp - interval '1 second',
        '10 seconds'
    ) ts (ts)
    left join
    speed_table t on ts.ts = t.ts and condition2 = 'something'
group by 1
order by 1

The example is grouped by 30 seconds. It is number of seconds since 1970-01-01 00:00:00 (epoch) divided by 120. When you want to group by 5 minutes divide it by 12 (60 / 5).

The generate_series in the example is generating timestamps at 1 second interval. It is left outer joined to the speed table so it fills the gaps. When the speed is null then coalesce returns 0.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260