1

Context

I have this table:

Time Metric A Metric B Value
A A A 1
A A B 2
A A C 1
B A A 1
B A B 2
B A C 1

I want to get the last and summarize the table by Metric A, but sometimes Time goes bonkers and GROUP BY Time is not an option.

What I want is to get the last value of Metric B for each Metric A. Any tips on how to make sure I get the last inserted value for Metric B for Metric A?

Desired output

Time Metric A Metric B Value
A A A 1
A A B 2
A A C 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
HufflepuffBR
  • 443
  • 3
  • 15

3 Answers3

0

You could use rank() or dense_rank() to achieve what you're looking for.

create table table1 (
  time timestamp, 
  metric_a varchar(1), 
  metric_b varchar(1), 
  value integer
  );
  
insert into table1 values 
(current_timestamp, 'A','A',1),
(current_timestamp, 'A','B',2),
(current_timestamp, 'A','C',1),
(current_timestamp - interval '10 minutes', 'A','A',1), 
(current_timestamp - interval '10 minutes', 'A','B',2), 
(current_timestamp - interval '10 minutes', 'A','C',1), 
(current_timestamp, 'B','A',2),
(current_timestamp, 'B','B',3),
(current_timestamp, 'B','C',4),
(current_timestamp - interval '10 minutes', 'B','A',2), 
(current_timestamp - interval '10 minutes', 'B','B',3), 
(current_timestamp - interval '10 minutes', 'B','C',4);

select time, metric_a, metric_b, value 
from (
  select *, 
   dense_rank() over (partition by metric_a, metric_b order by time desc) as rnk
  from table1
  )z
where rnk = 1;
time metric_a metric_b value
2023-08-14T15:13:49.623Z A A 1
2023-08-14T15:13:49.623Z A B 2
2023-08-14T15:13:49.623Z A C 1
2023-08-14T15:13:49.623Z B A 2
2023-08-14T15:13:49.623Z B B 3
2023-08-14T15:13:49.623Z B C 4

View on DB Fiddle

Isolated
  • 5,169
  • 1
  • 6
  • 18
  • This - clearly correct - is the very definition of PostgreSQL's brilliant `distinct on` with [custom order by](https://stackoverflow.com/questions/9795660/postgresql-distinct-on-with-different-order-by). Have a look at it. Your solution is equivalent to simply `select distinct on (metric_a, metric_b) * from the_table order by metric_a, metric_b, time desc; `. – Stefanov.sm Aug 14 '23 at 17:01
  • @Stefanov.sm Nice, haven't used that before. – Isolated Aug 14 '23 at 17:04
  • 1
    It is _very_ useful and clean, I like it ver much. – Stefanov.sm Aug 14 '23 at 17:05
  • `dense_rank()` would make sense to get *multiple* latest rows per combination if they tie. That doesn't seem to apply. – Erwin Brandstetter Aug 15 '23 at 02:52
  • @Stefanov.sm: Just plain `DISTINCT ON`, no different order. That link points in the wrong direction. – Erwin Brandstetter Aug 15 '23 at 02:54
0

Using a subquery:

select t.* from tbl t where t.time = (select max(t1.time) 
   from tbl t1 where t1.metric_a = t.metric_a and t1.metric_b = t.metric_b)

See fiddle

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • There's a problem because, thanks as how those tables are loaded, there can some seconds diff between fields that should be got together – HufflepuffBR Aug 14 '23 at 17:48
0

So basically you want the latest row for each combination (A,B).
A plain DISTINCT ON does that:

SELECT DISTINCT ON (metric_a, metric_b) *
FROM   tbl
ORDER  BY metric_a, metric_b, time DESC;

fiddle

A matching multicolumn index is typically very useful for performance:

CREATE INDEX ON tbl (metric_a, metric_b, time DESC);

DISTINCT ON is typically fastest, too, for few rows per group. (Which seems to be your case.) See:

If time isn't defined NOT NULL, you'll want to sort DESC NULLS LAST. See:

Aside: "time" is a bad name for a timestamp column, time being a different basic type name.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228