Select multiple (non-aggregate function) columns with GROUP BY

Question

I am trying to select the max value from one column, while grouping by another non-unique id column which has multiple duplicate values. The original database looks something like:

mukey    | comppct_r | name | type
65789    | 20        | a    | 7n
65789    | 15        | b    | 8m
65789    | 1         | c    | 1o
65790    | 10        | a    | 7n
65790    | 26        | b    | 8m
65790    | 5         | c    | 1o
...

This works just fine using:

SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.mukey;

Which returns a table like:

mukey    | ComponentPercent
65789    | 20
65790    | 26
65791    | 50
65792    | 90

I want to be able to add other columns in without affecting the GROUP BY function, to include columns like name and type into the output table like:

mukey    | comppct_r | name | type
65789    | 20        | a    | 7n
65790    | 26        | b    | 8m
65791    | 50        | c    | 7n
65792    | 90        | d    | 7n

but it always outputs an error saying I need to use an aggregate function with select statement. How should I go about doing this?

What is your expected result? – Steven Wexler Jul 01 '13 at 18:31 — Steven Wexler, Jul 01 '13 at 18:31
Added that to the op, Thanks – Dylansq Jul 01 '13 at 18:35 — Dylansq, Jul 01 '13 at 18:35

score 23 · Accepted Answer · edited May 23 '17 at 11:46

23

You have yourself a greatest-n-per-group problem. This is one of the possible solutions:

select c.mukey, c.comppct_r, c.name, c.type
from c yt
inner join(
    select c.mukey, max(c.comppct_r) comppct_r
    from c
    group by c.mukey
) ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r

Another possible approach, same output:

select c1.*
from c c1
left outer join c c2
on (c1.mukey = c2.mukey and c1.comppct_r < c2.comppct_r)
where c2.mukey is null;

There's a comprehensive and explanatory answer on the topic here: SQL Select only rows with Max Value on a Column

edited May 23 '17 at 11:46

Community

1
1

answered Jul 01 '13 at 18:32

Adriano Carneiro

57,693
12
90
123

What does the 'ss' in "ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r" mean? – Dylansq Jul 01 '13 at 18:51
1

`ss` is the name I gave to the derived table (the structure declared between parenthesis) – Adriano Carneiro Jul 01 '13 at 18:57
how second solution works? in my case it does nothing - just displays all rows. It seems that that approach must use group by as well... – Akmal Salikhov Feb 04 '17 at 14:25
How fast is the first query, if there are a lot of rows in a table? e.g. One million, and if limit is applied e.g. limit 50 – tylik Apr 19 '19 at 15:33

cosmos · Answer 2 · 2013-07-01T18:46:54.380

Any non-aggregate column should be there in Group By clause .. why??

                  t1
x1           y1           z1
1            2             5
2            2             7

Now you are trying to write a query like:

select x1,y1,max(z1) from t1 group by y1;

Now this query will result only one row, but what should be the value of x1?? This is basically an undefined behaviour. To overcome this, SQL will error out this query.

Now, coming to the point, you can either chose aggregate function for x1 or you can add x1 to group by. Note that this all depends on your requirement.

If you want all rows with aggregation on z1 grouping by y1, you may use SubQ approach.

Select x1,y1,(select max(z1) from t1  where tt.y1=y1 group by y1)
 from t1 tt;

This will produce a result like:

                  t1
x1           y1           max(z1)
1            2             7
2            2             7

score 0 · Answer 3 · answered Jul 01 '13 at 18:31

0

Try using a virtual table as follows:

SELECT vt.*,c.name FROM(
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke;
) as VT, c 
WHERE VT.mukey = c.mukey

answered Jul 01 '13 at 18:31

rgrano

351
1
6

score 0 · Answer 4 · answered Jul 01 '13 at 18:32

You can't just add additional columns without adding them to the GROUP BY or applying an aggregate function. The reason for that is, that the values of a column can be different inside one group. For example, you could have two rows:

mukey    | comppct_r | name | type
65789    | 20        | a    | 7n
65789    | 20        | b    | 9f

How should the aggregated group look like for the columns name and type?

If name and type is always the same inside a group, just add it to the GROUP BY clause:

SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke, c.name, c.type;

score 0 · Answer 5 · answered Mar 19 '21 at 14:17

0

Use a 'Having' clause

SELECT *
FROM c
GROUP BY c.mukey
HAVING c.comppct_r = Max(c.comppct_r);

answered Mar 19 '21 at 14:17

shashin

51
1
4

Select multiple (non-aggregate function) columns with GROUP BY

5 Answers5

Linked

Related