How to group specific SQL columns and retrieve rows with highest counts for those columns?

Question

I have the following data:

col_1 | col_2 | col_3 | col_4
-----------------------------
a1      b1      c1      d1
a1      b2      c1      d1
a1      b3      c1      d1
a1      b4      c1      d2
a1      b5      c2      d2
a1      b6      c2      d2
a1      b7      c1      d3
a1      b8      c2      d3
a1      b9      c3      d3
a1      b10     c1      d2
a1      b11     c2      d3
a2      b12     c1      d1
a3      b13     c1      d1

I am interested in being able to:

Return rows where the value for col_1 is unique
For each row in the result, it should return the values for the columnns that have the highest counts when grouping by: col_3, col_4

For example, I would like the output to return the following:

col_1 | col_2 | col_3 | col_4
-----------------------------
a1      b1      c1      d1
a2      b12     c1      d1
a3      b13     c1      d1

Notice in the result that each value in col_1 is unique. Also note that for a1, it returned with c1 and d1 as they had the highest group by counts for a1.

How can I achieve this by SQL query? I will be using it for a Hive SQL query.

You've tagged your question with several [tag:rdbms]s. Please [edit] it and retag it only with the one you're actually using. — Mureinik, Mar 22 '20 at 21:14
Does this answer your question? [How to select the first row of each group?](https://stackoverflow.com/questions/33878370/how-to-select-the-first-row-of-each-group) — user10938362, Mar 22 '20 at 21:25

score 1 · Answer 1 · answered Mar 22 '20 at 21:22

You can use aggregation and window functions:

select col_1, col_2, col_3, col_4
from (
    select
        col_1, 
        col_2, 
        col_3, 
        col_4, 
        rank() over(partition by col_1 order by count(*) desc) rn
    from mytable t
    group by col_1, col_2, col_3, col_4
) t
where rn = 1

score 1 · Accepted Answer · answered Mar 22 '20 at 21:27

With row_number() window function:

select t.col_1, t.col_2, t.col_3, t.col_4
from (
  select col_1, min(col_2) col_2, col_3, col_4,
    row_number() over (partition by col_1 order by count(*) desc) rn
  from tablename
  group by col_1, col_3, col_4
) t
where t.rn = 1

See the demo.
Results:

| col_1 | col_2 | col_3 | col_4 |
| ----- | ----- | ----- | ----- |
| a1    | b1    | c1    | d1    |
| a2    | b12   | c1    | d1    |
| a3    | b13   | c1    | d1    |

score 1 · Answer 3 · answered Mar 22 '20 at 21:32

You can use window functions if you want the complete rows:

select t.*
from (select t.*,
             rank() over (partition by col1 order by cnt desc) as seqnum
      from (select t.*, count(*) over (partition by col1, col3, col4) as cnt
            from t
           ) t
     ) t
where seqnum = 1;

The innermost subquery counts the number of rows for each col1/col3/col4 combination. The middle subquery enumerates the rows the highest count for each col1. The outermost filters for the highest count.

How to group specific SQL columns and retrieve rows with highest counts for those columns?

3 Answers3