Tricky Postgresql query

Question

Given the following table.

CREATE TABLE example (
    a integer,
    b integer,
    c integer,
    UNIQUE (a, b)
);

How do I get one row for every a such that c is the max of that a?

For example given the table below,

a|b|c
-----
1 1 1
1 2 2
2 1 9
3 2 4
3 3 5
3 4 6

I should get the back

a|b|c
-----
1 2 2
2 1 9
3 4 6

Maybe the title of this question should be "Not so tricky SQL" — Hogan, Mar 02 '15 at 19:13
See also (possible duplicate?) http://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group?rq=1 and (MySQL but some portable techniques) http://stackoverflow.com/questions/8748986/get-records-with-highest-smallest-whatever-per-group — IMSoP, Mar 05 '15 at 12:46

jpw · Accepted Answer · 2015-03-05T12:37:32.193

1

The trick is to find the max c for every a in a derived table that you join with, like this:

select a, b, c
from example
join (select a, max(c) max_c from example group by a) max_c
on example.a = max_c.a and example.c = max_c.max_c

edited Mar 05 '15 at 12:37

answered Mar 02 '15 at 19:11

jpw

44,361
6
66
86

It should be 'example.a = max_c.a ' in the last line instead of max_c.c – Srinath Mandava Mar 05 '15 at 06:49
This will return more than one row if there are multiple rows with the same `a` and `c` values, e.g. {1, 0, 2}, {1, 1, 0}, {1, 2, 2}, {1, 3, 1}, the `max_c` in the inner query for a=1 will be 2, and the join will match both {1, 0, 2} and {1, 2, 2} – IMSoP Mar 05 '15 at 12:43
@IMSoP Indeed it would. But it would be distinct rows. I don't think that's an error. It's impossible to tell from looking at the data alone. It would be easy to add a `max(b)` and `group by a,c` to fix it. – jpw Mar 05 '15 at 12:47
@jpw The question asks for "one row for every a", but this solution only guarantees that if you can add a unique constraint on `(a, c)`. The rows are distinct, but they're distinct on column `b`, which is our "output" value, so you have to tie-break them somehow. One option in this (simplified) example is to take `min(b)`, but if there were more columns, you would then be mixing values from different rows, which is not what is generally needed in this scenario. – IMSoP Mar 05 '15 at 12:51
@IMSoP I guess maybe your last comment addressed my previous comment before I added the part with max and group by? – jpw Mar 05 '15 at 12:53
@jpw We overlapped a bit, yes. My point stands, though: without an aggregate on `b`, or a constraint on `(a, c)`, this doesn't deliver the aim of "one row for every a". With an aggregate, it doesn't generalise to other scenarios, e.g. for 4 columns `a, b, c, d`, taking `a, max(b), c, max(d)` means you're no longer using a single row of the input for each row of the output, since you might have one row with a very high `b` and another with a very high `d`. – IMSoP Mar 05 '15 at 12:57
@IMSoP Yes you're right, but now you're also talking about a different scenario that isn't what this specific question asks about. Feel free to post a better solution though. – jpw Mar 05 '15 at 13:03
@jpw The scenario is clearly simplified, so it's important to note the limitations of any solution, so that readers (not just the user who asked) can decide if they need to look for some more complex or not. I've linked to a couple of existing questions in the comments under the question which have some more broadly applicable (but fiddlier) techniques. – IMSoP Mar 05 '15 at 13:06

Tricky Postgresql query

1 Answers1