76

I have a table with id, year and count.

I want to get the MAX(count) for each id and keep the year when it happens, so I make this query:

SELECT id, year, MAX(count)
FROM table
GROUP BY id;

Unfortunately, it gives me an error:

ERROR: column "table.year" must appear in the GROUP BY clause or be used in an aggregate function

So I try:

SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;

But then, it doesn't do MAX(count), it just shows the table as it is. I suppose because when grouping by year and id, it gets the max for the id of that specific year.

So, how can I write that query? I want to get the id´s MAX(count) and the year when that happens.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Project Dumbo Dev
  • 763
  • 1
  • 5
  • 5
  • 2
    if {id,year} are unique, `max(thing)` is the same as `thing`. Also note that "count" is a keyword, (and year as well in some dialects, IIRC) – wildplasser Nov 10 '12 at 20:04
  • What year would you want to get with each id? There isn't a "the year", there are more then one, maybe you want `MAX(year)`? – mata Nov 10 '12 at 20:07
  • Yeah, they are unique therefore I get thing. I want to get ids MAX(thing) and also, see in what year that happens. (I wasn't writing count in my code, just an example) – Project Dumbo Dev Nov 10 '12 at 20:12
  • I wanna get the year in which that ID maxes the "thing" column – Project Dumbo Dev Nov 10 '12 at 20:13

2 Answers2

105

The shortest (and possibly fastest) query would be with DISTINCT ON, a PostgreSQL extension of the SQL standard DISTINCT clause:

SELECT DISTINCT ON (1)
       id, count, year
FROM   tbl
ORDER  BY 1, 2 DESC, 3;

The numbers refer to ordinal positions in the SELECT list. You can spell out column names for clarity:

SELECT DISTINCT ON (id)
       id, count, year
FROM   tbl
ORDER  BY id, count DESC, year;

The result is ordered by id etc. which may or may not be welcome. It's better than "undefined" in any case.

It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year from the ORDER BY. Or pick the latest year with year DESC.

For many rows per id, other query techniques are (much) faster. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Perfect solution with the less query execution time. I tried with grouping and getting the max data. But it was slow. Thank you Erwin.. – REMITH Jun 06 '18 at 11:52
  • i have another problem same to that , i have two output from my MAX() , so the outputs is only the first occurred one i want to get both of them how i can do that ? – Bassam Abdeltwab Mar 31 '19 at 15:38
  • Also a good read on potentially better alternatives and discussion https://wiki.postgresql.org/wiki/Loose_indexscan – mlt Nov 06 '19 at 20:27
61
select *
from (
  select id, 
         year,
         thing,
         max(thing) over (partition by id) as max_thing
  from the_table
) t
where thing = max_thing

or:

select t1.id,
       t1.year,
       t1.thing
from the_table t1
where t1.thing = (select max(t2.thing) 
                  from the_table t2
                  where t2.id = t1.id);

or

select t1.id,
       t1.year,
       t1.thing
from the_table t1
  join ( 
    select id, max(t2.thing) as max_thing
    from the_table t2
    group by id
  ) t on t.id = t1.id and t.max_thing = t1.thing

or (same as the previous with a different notation)

with max_stuff as (
  select id, max(t2.thing) as max_thing
  from the_table t2
  group by id
) 
select t1.id, 
       t1.year,
       t1.thing
from the_table t1
  join max_stuff t2 
    on t1.id = t2.id 
   and t1.thing = t2.max_thing
  • @user1504577: All of these queries return multiple values per id when multiple years share the maximum count. You would have to define what you want in this common case. Show all? Pick one? The latest / earliest / whatever? Show a list of years in one column? – Erwin Brandstetter Nov 11 '12 at 02:21
  • @a_horse_with_no_name can you explain the pros and cons of each of these queries? – Stratus3D Jan 03 '17 at 15:04
  • choose this answer as it provide the query that works across different SQL engines, migrate between server is already as annoying as f**k. – qichao_he Jun 10 '18 at 03:32