Gather the last entry for a specific metric from multiple fields

Question

Context

I have this table:

Time	Metric A	Metric B	Value
A	A	A	1
A	A	B	2
A	A	C	1
B	A	A	1
B	A	B	2
B	A	C	1

I want to get the last and summarize the table by Metric A, but sometimes Time goes bonkers and GROUP BY Time is not an option.

What I want is to get the last value of Metric B for each Metric A. Any tips on how to make sure I get the last inserted value for Metric B for Metric A?

Desired output

Time	Metric A	Metric B	Value
A	A	A	1
A	A	B	2
A	A	C	1

I want the first three line (Time A) to be queried. For some reasons, GROUP BY is not an option — HufflepuffBR, Aug 14 '23 at 13:28
But you still haven't explained your desired output based on your sample input. Is the answer 1, or C, or something else? Update the question with the desired output. — Isolated, Aug 14 '23 at 13:33
But these are the earliest values for metric B given that time A is before time B, not the latest as written in the question — Stefanov.sm, Aug 14 '23 at 13:51
OK... My bad.. What I need is the latest data (think I put a Order by TIME desc on it) — HufflepuffBR, Aug 14 '23 at 14:22

Isolated · Accepted Answer · 2023-08-14T15:09:33.150

0

You could use rank() or dense_rank() to achieve what you're looking for.

create table table1 (
  time timestamp, 
  metric_a varchar(1), 
  metric_b varchar(1), 
  value integer
  );
  
insert into table1 values 
(current_timestamp, 'A','A',1),
(current_timestamp, 'A','B',2),
(current_timestamp, 'A','C',1),
(current_timestamp - interval '10 minutes', 'A','A',1), 
(current_timestamp - interval '10 minutes', 'A','B',2), 
(current_timestamp - interval '10 minutes', 'A','C',1), 
(current_timestamp, 'B','A',2),
(current_timestamp, 'B','B',3),
(current_timestamp, 'B','C',4),
(current_timestamp - interval '10 minutes', 'B','A',2), 
(current_timestamp - interval '10 minutes', 'B','B',3), 
(current_timestamp - interval '10 minutes', 'B','C',4);

select time, metric_a, metric_b, value 
from (
  select *, 
   dense_rank() over (partition by metric_a, metric_b order by time desc) as rnk
  from table1
  )z
where rnk = 1;

time	metric_a	metric_b	value
2023-08-14T15:13:49.623Z	A	A	1
2023-08-14T15:13:49.623Z	A	B	2
2023-08-14T15:13:49.623Z	A	C	1
2023-08-14T15:13:49.623Z	B	A	2
2023-08-14T15:13:49.623Z	B	B	3
2023-08-14T15:13:49.623Z	B	C	4

View on DB Fiddle

edited Aug 14 '23 at 15:09

answered Aug 14 '23 at 15:00

Isolated

5,169
1
6
18

This - clearly correct - is the very definition of PostgreSQL's brilliant `distinct on` with [custom order by](https://stackoverflow.com/questions/9795660/postgresql-distinct-on-with-different-order-by). Have a look at it. Your solution is equivalent to simply `select distinct on (metric_a, metric_b) * from the_table order by metric_a, metric_b, time desc; `. – Stefanov.sm Aug 14 '23 at 17:01
@Stefanov.sm Nice, haven't used that before. – Isolated Aug 14 '23 at 17:04
1

It is _very_ useful and clean, I like it ver much. – Stefanov.sm Aug 14 '23 at 17:05
`dense_rank()` would make sense to get *multiple* latest rows per combination if they tie. That doesn't seem to apply. – Erwin Brandstetter Aug 15 '23 at 02:52
@Stefanov.sm: Just plain `DISTINCT ON`, no different order. That link points in the wrong direction. – Erwin Brandstetter Aug 15 '23 at 02:54

score 0 · Answer 2 · answered Aug 14 '23 at 16:26

0

Using a subquery:

select t.* from tbl t where t.time = (select max(t1.time) 
   from tbl t1 where t1.metric_a = t.metric_a and t1.metric_b = t.metric_b)

See fiddle

answered Aug 14 '23 at 16:26

Ajax1234

69,937
8
61
102

There's a problem because, thanks as how those tables are loaded, there can some seconds diff between fields that should be got together – HufflepuffBR Aug 14 '23 at 17:48

score 0 · Answer 3 · answered Aug 15 '23 at 02:46

So basically you want the latest row for each combination (A,B).
A plain DISTINCT ON does that:

SELECT DISTINCT ON (metric_a, metric_b) *
FROM   tbl
ORDER  BY metric_a, metric_b, time DESC;

fiddle

A matching multicolumn index is typically very useful for performance:

CREATE INDEX ON tbl (metric_a, metric_b, time DESC);

DISTINCT ON is typically fastest, too, for few rows per group. (Which seems to be your case.) See:

Select first row in each GROUP BY group?

If time isn't defined NOT NULL, you'll want to sort DESC NULLS LAST. See:

Sort by column ASC, but NULL values first?

Aside: "time" is a bad name for a timestamp column, time being a different basic type name.

Gather the last entry for a specific metric from multiple fields

Context

Desired output

3 Answers3