I have the following data:
col_1 | col_2 | col_3 | col_4
-----------------------------
a1 b1 c1 d1
a1 b2 c1 d1
a1 b3 c1 d1
a1 b4 c1 d2
a1 b5 c2 d2
a1 b6 c2 d2
a1 b7 c1 d3
a1 b8 c2 d3
a1 b9 c3 d3
a1 b10 c1 d2
a1 b11 c2 d3
a2 b12 c1 d1
a3 b13 c1 d1
I am interested in being able to:
- Return rows where the value for
col_1
is unique - For each row in the result, it should return the values for the columnns that have the highest counts when grouping by:
col_3
,col_4
For example, I would like the output to return the following:
col_1 | col_2 | col_3 | col_4
-----------------------------
a1 b1 c1 d1
a2 b12 c1 d1
a3 b13 c1 d1
Notice in the result that each value in col_1
is unique. Also note that for a1
, it returned with c1
and d1
as they had the highest group by counts for a1
.
How can I achieve this by SQL query? I will be using it for a Hive SQL query.