Get row filtering by max(col) in postgresql

Question

I'm using postgresql and I'm unfamiliar with it. This code works but I was wondering if I could write it in a more straightforward way. Here I join bar to join bar in a subquery. I was hoping there's something simple like select * from bar group by baz using max(z)

select *
from foo f
join bar b on(f.baz=b.baz AND b.z in (select max(z) from bar group by baz))
where uid1 = 120

Is `uid1` a column of `bar` or `foo`? The best query depends on it. Postgres version and table definitions would also be instrumental. — Erwin Brandstetter, Sep 28 '16 at 14:22
@ErwinBrandstetter uid is from bar but really I was just learning how to do this commonly written line in a more simple way. It appears `explain` prefers the distinct/orderby method and I do too. — , Sep 28 '16 at 14:30
What the Postgres query planner prefers (and you get to see with `EXPLAIN`), very much depends on the complete picture. With big tables, your query can be much faster one or the other way. Doesn't matter much with small tables. But performance optimization is beyond the scope of this question. — Erwin Brandstetter, Sep 28 '16 at 14:37

score 2 · Accepted Answer · answered Sep 28 '16 at 13:12

2

Just use distinct on:

select distinct on (f.baz) *
from foo f join
     bar b
     on f.baz = b.baz 
where uid1 = 120
order by f.baz, b.z desc;

answered Sep 28 '16 at 13:12

Gordon Linoff

1,242,037
58
646
786

score 1 · Answer 2 · edited May 23 '17 at 10:27

@Gordon's answer is almost but not quite correct. There is a subtle difference between max(b.z) and ORDER BY b.z DESC if the column can be NULL. Often overlooked and cause for much headache.

This can bite you, even with a column b.z defined NOT NULL. An outer join can introduce NULL values for the column. Wouldn't be effective in this simple query, but can be in more complex queries.

While aggregate functions like min, max or count ignore NULL values, those have to be dealt with somehow when the same column is used in the ORDER BY clause.
In default sort order NULL values sort last. When the sort order is reversed with DESC, NULL values sort first. It must be that way to stay consistent.
Hence, you get a row with baz IS NULL from DISTINCT ON if any NULL value is in the set of peers, where you would get the greatest not-null value (if any) from max(b.z). Most probably not what you want. You can fix it with NULLS LAST:

PostgreSQL sort by datetime asc, null first?

Plus, while using SELECT *, you would not want to return the join column baz twice - guaranteed to be identical. The USING clause comes in handy, it only returns the column once.

So:

SELECT DISTINCT ON (baz) *
FROM   foo f
JOIN   bar b USING (baz)
WHERE  uid1 = 120   -- where from?
ORDER  BY baz, b.z DESC NULLS LAST;

NULLS LAST won't hurt, even if there are no NULL values - unless you actually want NULL values to prevail.

Detailed explanation for DISTINCT ON:

Select first row in each GROUP BY group?

Faster

Since uid1 is from baz (as commented), this query is typically faster for big tables:

SELECT *
FROM   foo f
JOIN  (
   SELECT DISTINCT ON (baz) *
   FROM   bar
   WHERE  uid1 = 120
   ORDER  BY baz, z DESC NULLS LAST
   ) b USING (baz);

Depending on table definition and data distribution, there may be even faster query techniques:

Optimize GROUP BY query to retrieve latest record per user

My actual code does `where uid1 = :uid OR uid2=:uid` uid is either the current logged in user or whoever the admin is impersonating. The column is never/not null — , Sep 28 '16 at 14:27
@acidzombie24: If you are interested in performance optimization I suggest you write a new question where you provide exact table definitions (`CREATE TABLE` statements) and your Postgres version along with your actual query. The simplification here may be misleading. You can add a comment here to get my attention. — Erwin Brandstetter, Sep 28 '16 at 14:29

score 0 · Answer 3 · answered Sep 28 '16 at 13:18

0

Hi,
You can have below query,

SELECT * FROM foo f
INNER JOIN (SELECT baz, MAX(z) FROM bar GROUP BY baz) b
ON (f.baz = b.baz)
WHERE f.uid1 = 120;

Just joining bar with the MAX(z) derived and populating your required value. uid1 must be from foo table and since f is used as alias, if not change to b.uid1

answered Sep 28 '16 at 13:18

Jim Macaulay

4,709
4
28
53

I wasn't clear with my original code bc I was lazy and just put * but I need to access the entire row (bar or b.*). Which is why I joined bar with itself. – Sep 28 '16 at 13:28

Get row filtering by max(col) in postgresql

3 Answers3

Faster