3

I have a table with epoch values (one per minute, the epoch itself is in milliseconds) and temperatures.

select * from outdoor_temperature order by time desc;
     time      | value
---------------+-------
 1423385340000 |  31.6
 1423385280000 |  31.6
 1423385220000 |  31.7
 1423385160000 |  31.7
 1423385100000 |  31.7
 1423385040000 |  31.8
 1423384980000 |  31.8
 1423384920000 |  31.8
 1423384860000 |  31.8
 [...]

I want to get the lowest value (and highest, but that can be a separate query) that occurred in each day, and the specific time (preferably the original epoch time) when that occurred. I've managed to do it with date_trunc but that gives me the general day, rather than the specific time within that day:

select
    date_trunc('day',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp,
    min(value)
from outdoor_temperature
group by timestamp
order by min asc
limit 5;

       timestamp        | min
------------------------+------
 2015-03-27 00:00:00+10 | 10.7
 2015-03-28 00:00:00+10 | 10.8
 2015-01-30 00:00:00+10 | 13.6
 2015-03-17 00:00:00+10 | 14.0
 2015-03-29 00:00:00+10 | 14.5
(5 rows)

Is there some sort of join magic I need to do (my join-fu is extremely weak), or am I attacking this from totally the wrong direction? I tried DISTINCT ON but didn't manage to even get that working.

VirtualWolf
  • 653
  • 13
  • 23
  • Practically the same logic: http://stackoverflow.com/questions/29396069/distinct-on-query-w-order-by-max-value-of-a-column/29422013#29422013 – Erwin Brandstetter Apr 04 '15 at 02:49

3 Answers3

2

You can start from this query:

SELECT  date_trunc('minute',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp, value AS temperature from _outdoor_temperature

which shows two columns, the first is "epoch" converted to the timestamp with "minute" precision. Since you need to find the lowest/highest value for each day, would be nice to have also column with just a date rather than timestamp:

SELECT
x.timestamp::date AS a,
x.timestamp AS b,
temperature AS c
FROM (
SELECT  date_trunc('minute',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp, value AS temperature from _outdoor_temperature
) AS x

So now you have a date as "a" column, a timestamp as "b" column and the temperature value in the last, "c" column.

The last part is to use "order by" in conjunctionw ith "distinct on" expression. This is better than group by, because you're finding unique values of one column and see the associations of another:

select distinct on(y.a)
y.a,
y.b,
y.c
from (
SELECT
x.timestamp::date AS a,
x.timestamp AS b,
temperature AS c
FROM (
SELECT  date_trunc('minute',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp, value AS temperature from _outdoor_temperature
) AS x
) y

order by y.a, y.c
percy
  • 988
  • 9
  • 9
  • Thanks! That works, but takes ~1.3 seconds to run over my current table rows (~144k). Running `explain analyze` it's having to do a sequential scan, is there a more efficient way? My dataset is only going to keep getting larger... :) – VirtualWolf Apr 03 '15 at 09:48
  • Do you have any indexes created on this table? Maybe adding two extra columns with redundant information would be helpful? I mean, one extra field for this epoch converted to "date" and one to "timestamp". You would have to update those fields once you create a new row and when the "epoch" changes - but it can speed up reads, like the above select query. – percy Apr 03 '15 at 11:06
  • Sorry, yep, there's indexes on both columns. – VirtualWolf Apr 03 '15 at 11:56
  • Then try to create a temporary table, just to compare the performance: CREATE TEMPORARY TABLE _test AS SELECT x.timestamp::date AS a, x.timestamp AS b, temperature AS c FROM ( SELECT date_trunc('minute',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp, value AS temperature from _outdoor_temperature ) AS x and execute the following query instead: select distinct on(y.a) y.a, y.b, y.c from ( SELECT * FROM _test ) y order by y.a, y.c Presumably, converting dates and sorting it in the same query is the bottleneck. If this helps, consider adding redundancy... – percy Apr 03 '15 at 12:08
  • Ha, yeah, you're right on. With the temporary table it's 298ms to run. :) Thanks very much! – VirtualWolf Apr 04 '15 at 00:09
2
select day::date, min_value_timestamp, min_value, max_value_timestamp, max_value
from
    (
        select distinct on (1)
            date_trunc('day', timestamp with time zone 'epoch' + time/1000 * interval '1 second') as day,
            timestamp with time zone 'epoch' + (time/1000 * interval '1 second') as min_value_timestamp,
            value as min_value
        from outdoor_temperature
        order by 1, 3
    ) s
    inner join
    (
        select distinct on (1)
            date_trunc('day', timestamp with time zone 'epoch' + time/1000 * interval '1 second') as day,
            timestamp with time zone 'epoch' + (time/1000 * interval '1 second') as max_value_timestamp,
            value as max_value
        from outdoor_temperature
        order by 1, 3 desc
    ) v using (day)
order by 1
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
1

Ok, thanks to @voycheck's suggestion I ended up adding another column of type date and populating that with just the date that corresponds to the time field, so the table looks like this:

 Column |  Type   | Modifiers
--------+---------+-----------
 time   | bigint  | not null
 value  | numeric |
 date   | date    |
Indexes:
    "outdoor_temperature_pkey" PRIMARY KEY, btree ("time")
    "outdoor_temperature_date_idx" btree (date)
    "outdoor_temperature_value_idx" btree (value)

Which then massively simplified and sped up the SQL query:

SELECT time, value FROM (
    SELECT DISTINCT ON (date)
        date, time, value
    FROM outdoor_temperature
    ORDER BY date, value desc
) t
ORDER BY t.value desc;
VirtualWolf
  • 653
  • 13
  • 23