2

Inspired by this great answer I wrote the following query that returns the AVG calculated according 5-minutes intervals for the last year.

What I would like to have is all the 5-minutes intervals and, in case, set to null if no rows fit into a particular timespan.

with intervals as (select
                     (select min("timestamp") from public.hst_energy_d) + n AS start_timestamp,
                     (select min("timestamp") from public.hst_energy_d) + n + 299 AS end_timestamp
                   from generate_series(extract(epoch from now())::BIGINT - 10596096000, extract(epoch from now())::BIGINT, 300) n)
(SELECT AVG(meas."Al1") as "avg", islots.start_timestamp AS "timestamp"
FROM public.hst_energy_d meas
  RIGHT OUTER JOIN intervals islots
    on meas.timestamp >= islots.start_timestamp and meas.timestamp <= islots.end_timestamp
WHERE
  meas.idinstrum = 4
  AND
  meas.id_device = 122
  AND
  meas.timestamp > extract(epoch from now()) - 10596096000
GROUP BY islots.start_timestamp, islots.end_timestamp
ORDER BY timestamp);
Community
  • 1
  • 1
Bertuz
  • 2,390
  • 3
  • 25
  • 50

2 Answers2

1

I think I see what you're trying to do, and I wonder if using interval '5 minutes' liberally would't be a better and easier to follow approach:

with times as (  -- find the first date in the dataset, up to today
  select
    date_trunc ('minutes', min("timestamp")) - 
    mod (extract ('minutes' from min("timestamp"))::int, 5) * interval '1 minute' as bt,
    date_trunc ('minutes', current_timestamp) - 
    mod (extract ('minutes' from current_timestamp)::int, 5) * interval '1 minute' as et
  from hst_energy_d
  where
    idinstrum = 4 and
    id_device = 122
), -- generate every possible range between these dates
ranges as (
  select
    generate_series(bt, et, interval '5 minutes') as range_start
  from times
), -- normalize your data to which 5-minut interval it belongs to
rounded_hst as (
  select
    date_trunc ('minutes', "timestamp") - 
    mod (extract ('minutes' from "timestamp")::int, 5) * interval '1 minute' as round_time,
    *
  from hst_energy_d
  where
    idinstrum = 4 and
    id_device = 122  
)
select
  r.range_start, r.range_start + interval '5 minutes' as range_end,
  avg (hd."Al1")
from
  ranges r
  left join rounded_hst hd on
    r.range_start = hd.round_time
group by
  r.range_start
order by
  r.range_start

By the way, the discerning eye may wonder why bother with the CTE rounded_hst and why not just use a "between" in the join. From everything I've tested and observed, the database will explode out all possibilities and then test the between condition in what amounts to a where clause -- a filtered cartesian. For this many intervals, that's guaranteed to be a killer.

The truncation of each data to the nearest five-minutes allows for a standard SQL join. I encourage you to test both, and I think you'll see what I mean.

-- EDIT 11/17/2016 --

Solution from OP that takes into account the times are numbers, not dates:

with times as (  -- find the first date in the dataset, up to today
    select
      date_trunc('minutes', to_timestamp(min("timestamp"))::timestamp) -
      mod(extract ('minutes' from to_timestamp(min("timestamp"))::timestamp)::int, 5) * interval '1 minute' as bt,
      date_trunc('minutes', current_timestamp::timestamp) -
      mod(extract ('minutes' from (current_timestamp)::timestamp)::int, 5) * interval '1 minute' as et
    from hst_energy_d
    where
      idinstrum = 4 and
      id_device = 122
), -- generate every possible range between these dates
    ranges as (
      select
        generate_series(bt, et, interval '5 minutes') as range_start
      from times
  ), -- normalize your data to which 5-minute interval it belongs to
    rounded_hst as (
      select
        date_trunc ('minutes', to_timestamp("timestamp")::timestamp)::timestamp -
        mod (extract ('minutes' from (to_timestamp("timestamp")::timestamp))::int, 5) * interval '1 minute' as round_time,
        *
      from hst_energy_d
      where
        idinstrum = 4 and
        id_device = 122
  )
select
  extract('epoch' from r.range_start)::bigint, extract('epoch' from r.range_start + interval '5 minutes')::bigint as range_end,
  avg (hd."Al1")
from
  ranges r
  left join rounded_hst hd on
                             r.range_start = hd.round_time
group by
  r.range_start
order by
  r.range_start;
Hambone
  • 15,600
  • 8
  • 46
  • 69
  • what a neat query! Unfortunately `timestamp` is a `BIGINT` in place of a real `timestamp` one, hence my math with numbers in place of timestamps – Bertuz Nov 17 '16 at 09:49
  • I adapted your suggested code in order to work with bigint but I'm not sure if this could still take advantage a lighter SQL. [gist to the new SQL](https://gist.github.com/bertuz/5544663474b8a0850dcede03d1903a02) can you please take a look and give me some feedback? Plus: you talked about Postgres performing better with your solution: how do you analysed that? By using `EXPLAIN`? Here is the [query plan](https://gist.github.com/bertuz/5f06ab81cde3e78231c94c3a76ad20f8) my query performs. Is it actually performing TWO scans on hst_energy_d? That's *a lot*! Thank you – Bertuz Nov 17 '16 at 10:34
  • Alright, `EXPLAIN` talks for itself: your solutions costs `48.47`, in constrast mine costs `247.17` – Bertuz Nov 17 '16 at 11:38
  • Aah, I didn't realize the datatype was numeric. Sorry about that. Honestly, I was scratching my head when you referenced `epoch`, but now it makes total sense. I'll amend my answer with your solution. I'm glad it was helpful. – Hambone Nov 17 '16 at 14:48
  • 1
    Also, on the explain plan -- yeah, I know what you were thinking. I only knew to do this because of past experience (http://stackoverflow.com/questions/31030589/postgresql-joining-between-two-values). Any time I see a `between` in a join, I know there is going to be trouble. The syntax is so nice and clean, but the execution is anything but – Hambone Nov 17 '16 at 14:49
0

I think this post will be suitable for you Group DateTime into 5,15,30 and 60 minute intervals

This is a way of grouping dates, I'll recommend to build a scalar function.

Community
  • 1
  • 1
Juan Piaggio
  • 190
  • 12