-1

I have the following data:

col_1 | col_2 | col_3 | col_4
-----------------------------
a1      b1      c1      d1
a1      b2      c1      d1
a1      b3      c1      d1
a1      b4      c1      d2
a1      b5      c2      d2
a1      b6      c2      d2
a1      b7      c1      d3
a1      b8      c2      d3
a1      b9      c3      d3
a1      b10     c1      d2
a1      b11     c2      d3
a2      b12     c1      d1
a3      b13     c1      d1

I am interested in being able to:

  • Return rows where the value for col_1 is unique
  • For each row in the result, it should return the values for the columnns that have the highest counts when grouping by: col_3, col_4

For example, I would like the output to return the following:

col_1 | col_2 | col_3 | col_4
-----------------------------
a1      b1      c1      d1
a2      b12     c1      d1
a3      b13     c1      d1

Notice in the result that each value in col_1 is unique. Also note that for a1, it returned with c1 and d1 as they had the highest group by counts for a1.

How can I achieve this by SQL query? I will be using it for a Hive SQL query.

GMB
  • 216,147
  • 25
  • 84
  • 135
code
  • 5,294
  • 16
  • 62
  • 113
  • You've tagged your question with several [tag:rdbms]s. Please [edit] it and retag it only with the one you're actually using. – Mureinik Mar 22 '20 at 21:14
  • 1
    Does this answer your question? [How to select the first row of each group?](https://stackoverflow.com/questions/33878370/how-to-select-the-first-row-of-each-group) – user10938362 Mar 22 '20 at 21:25

3 Answers3

1

You can use aggregation and window functions:

select col_1, col_2, col_3, col_4
from (
    select
        col_1, 
        col_2, 
        col_3, 
        col_4, 
        rank() over(partition by col_1 order by count(*) desc) rn
    from mytable t
    group by col_1, col_2, col_3, col_4
) t
where rn = 1
GMB
  • 216,147
  • 25
  • 84
  • 135
1

With row_number() window function:

select t.col_1, t.col_2, t.col_3, t.col_4
from (
  select col_1, min(col_2) col_2, col_3, col_4,
    row_number() over (partition by col_1 order by count(*) desc) rn
  from tablename
  group by col_1, col_3, col_4
) t
where t.rn = 1

See the demo.
Results:

| col_1 | col_2 | col_3 | col_4 |
| ----- | ----- | ----- | ----- |
| a1    | b1    | c1    | d1    |
| a2    | b12   | c1    | d1    |
| a3    | b13   | c1    | d1    |
forpas
  • 160,666
  • 10
  • 38
  • 76
1

You can use window functions if you want the complete rows:

select t.*
from (select t.*,
             rank() over (partition by col1 order by cnt desc) as seqnum
      from (select t.*, count(*) over (partition by col1, col3, col4) as cnt
            from t
           ) t
     ) t
where seqnum = 1;

The innermost subquery counts the number of rows for each col1/col3/col4 combination. The middle subquery enumerates the rows the highest count for each col1. The outermost filters for the highest count.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786