I have a table tab1 in timescale db which has 3 columns tag, time, value. time and tag make up the pk for the table: (time, tag).
There are more than 500 lakhs (50 000 000) of rows. I need to find the latest timestamp or the max(time) for each of the N tags.
There are few things that I tried, I'll share my experience with each:
- inner query
SELECT "time", "tag", "value"
FROM tab1
WHERE ("tag","time") IN
(SELECT "tag", MAX("time") FROM tab1 WHERE "tag" IN(tag1,tag2) GROUP BY "tag" );
this is giving result but taking around 19 sec to execute which is beyond the acceptable limit
- using last function from timescale db
SELECT tag, last(time, time), last(value,time)
FROM tab1
WHERE "tag" IN (tag1,tag2) GROUP BY "tag" ;
this is giving output within 10 sec.
I need to find another feasible solution similar to 2nd one that is perhaps performing better. I tried few other things like LATERAL JOIN (3), WINDOW FUNCTIONS (ROW_NUMBER, PARTITION) (4) but the solution is not as expected.
- using lateral gives cross of multiple columns and not one single value with max time as expected. Moreover it is taking 15sec to execute but it might be due to wrong query.
SELECT table1."tag", table1."time",table1."value" from tab1 as table1
join lateral (
SELECT table2 ."tag",table2 ."time" from tab1 as table2
where table2."tag" = table1."tag"
order by table2."time" desc limit 1
) p on true
where table1."tag" in (tag1,tag2)
- when trying partition I want to put limit 1 like this: (partition by tag order by time desc limit 1) but it is giving syntax error. without limit 1 I am not getting the latest time.
SELECT * from
( SELECT *, row_number() over (partition by tag order by time desc) as rownum
from tab1) a
where tag in (tag1,tag2)
Can anyone suggest what's wrong with the queries in 3,4 or suggest if there is another alternative.