postgresql: how to get primary key from a group by clause?

Question

This is a query which selects a set of desired rows:

select max(a), b, c, d, e
from T
group by b, c, d, e;

The table has a primary key, in column id.

I would like to identify these rows in a further query, by getting the primary key from each of those rows. How would I do that? This does not work:

select id, max(a), b, c, d, e
from T 
group by b, c, d, e;

ERROR:  column "T.id" must appear in the GROUP BY clause or be used in an aggregate function

I have tried this from poking around in some other postgresql questions, but no luck:

select distinct on (id) id, max(a), b, c, d, e
from T 
group by b, c, d, e;

ERROR:  column "T.id" must appear in the GROUP BY clause or be used in an aggregate function

What do I do? I know there can only be one id for each result, cause it's a primary key... I literally want the primary key along with the rest of the data, for each row that the initial (working) query returns.

Are you trying to get the id of the row with the maximum a value, or are you trying to get some id that has b,c,d,e that may or may not have the maximum a value? — evil otto, Oct 28 '11 at 23:26
@evilotto: the ids of exactly those rows that the initial query returns — Claudiu, Oct 28 '11 at 23:29
the rows that your original query returns do not correspond to single rows in your database. For each row returned by the query there can be 1 or more table rows that correspond. — evil otto, Oct 29 '11 at 00:09
@evilotto: i see. i'm trying to get the id of the row with the maximum a value, yes — Claudiu, Oct 29 '11 at 00:23
@Claudiu: unless `a` is unique, there can be *more that one* rows with the maximum `a`, and therefore more than one `id` that match the criteria. You are aware of that, right? — Erwin Brandstetter, Oct 29 '11 at 23:37

mu is too short · Accepted Answer · 2011-10-29T00:39:55.790

5

If you don't care which id you get then you just need to wrap your id in some aggregate function that is guaranteed to give you a valid id. The max and min aggregates come to mind:

-- Or min(id) if you want better spiritual balance.
select max(id), max(a), b, c, d, e
from T 
group by b, c, d, e;

Depending on your data I think using a window function would be a better plan (thanks to evil otto for the boot to the head):

select id, a, b, c, d, e
from (
    select id, a, b, c, d, e, rank() over (partition by b,c,d,e order by a desc) as r
    from T
) as dt
where r = 1

edited Oct 29 '11 at 00:39

answered Oct 28 '11 at 23:07

mu is too short

426,620
70
833
800

2

Be aware that the rows returned by this query will not necessarily be rows in your table. – evil otto Oct 29 '11 at 00:12
@evil: That's a good point and might be a problem depending on the data and how accurate "I want any given id record and I don't care which one?" is (see the comment on gahooa's answer). – mu is too short Oct 29 '11 at 00:20
@evilotto: could you go into how that might be the case? – Claudiu Oct 29 '11 at 00:24
consider the rows (99,1,1,1,1,1) and (1,99,1,1,1,1) -- max(id) is 99, max(a) is 99, this query gives both maxes: (99,99,1,1,1,1) – evil otto Oct 29 '11 at 00:27
@evil: I think a window function might be a better approach, I just added a possibility if you'd like to double check. – mu is too short Oct 29 '11 at 00:40
@Claudiu: My apologies for the brain damage but I think my update is what you're after but we'll see if Mr. Otto will double check it as I'm a bit out of at the moment. – mu is too short Oct 29 '11 at 00:42

score 4 · Answer 2 · edited May 23 '17 at 12:17

4

By virtue of the fact that you are grouping, there can (and will likely) be more than one matched record (eg, more than one id value) per returned record.

PostgreSQL is pretty strict - it will not guess at what you mean.

you could run a subquery
you could run another query based on b,c,d,e
you could use a array_agg grouping function to get an array of id values per record.

See this question: Postgresql GROUP_CONCAT equivalent?

I suggest you consider #3 as the most efficient of the possibilities.

Hope this helps. Thanks!

edited May 23 '17 at 12:17

Community

1
1

answered Oct 28 '11 at 21:12

gahooa

131,293
12
98
101

can I tell postgres that I want any given `id` record and I don't care which one? (since I do not) – Claudiu Oct 28 '11 at 21:21

score 2 · Answer 3 · answered Oct 29 '11 at 00:35

It's not what you asked, but I suspect what you are trying to do is to get the single row corresponding to the maximum value of one column for each group defined by several other columns. For example, to find the day that was the hottest monday/tuesday/etc for an entire year.

The best way I've found to do this uses a view to find all the maximum values for the groups. If your original table is

create table T as (
  id integer primary key,
  a integer,
  b integer,
  c integer,
  d integer)

then create the "max" view as

create view T_maxgroups as 
  select max(a) as a, b, c, d 
  from T
  group by b, c, d

(which is your initial query) and then join that view to your table to get rows with max values:

select T.* from T join maxgroups using (a,b,c,d)

ah yes this is indeed what i want! i didn't even realize that isn't what i asked... — Claudiu, Oct 29 '11 at 01:20

Joe Stefanelli · Answer 4 · 2011-10-28T21:15:29.043

1

Use your original query as a subquery, and use those results to join back to the original table to find the id.

SELECT T.id, T.a, T.b, T.c, T.d, T.e
    FROM (SELECT max(a) AS MaxA, b, c, d, e
              FROM T
              GROUP BY b,c,d,e) q
        INNER JOIN T
            ON T.a = q.MaxA
                AND T.b = q.b
                AND T.c = q.c
                AND T.d = q.d
                AND T.e = q.e

edited Oct 28 '11 at 21:15

answered Oct 28 '11 at 21:09

Joe Stefanelli

132,803
19
237
235

@Claudiu It shouldn't be. Adding some indexing on (a,b,c,d,e) would probably help it out. – Joe Stefanelli Oct 28 '11 at 21:24

score -2 · Answer 5 · answered Oct 28 '11 at 21:29

-2

juuust curious does adding it to the group by clause work?

select T.id, max(a), b, c, d, e
from T 
group by T.id b, c, d, e;

answered Oct 28 '11 at 21:29

user800612

169
2
9

postgresql: how to get primary key from a group by clause?

5 Answers5