You only need a single subquery with DISTINCT ON
:
SELECT DISTINCT ON (col1, col2)
col1, col2, col3, min(col4) As col4
FROM tbl
GROUP BY col1, col2, col3
ORDER BY col1, col2, count(*) DESC, col3;
This way you get a single row per (col1, col2)
with the most common col3
(the samllest value if multiple tie for "most common") an the smallest col4
to go along with that col3
.
Similarly, to get all qualifying col3
you can use the window function rank()
in a subquery, which is also executed after the aggregation:
SELECT col1, col2, col3, col4_list
FROM (
SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list
, rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rnk = 1
ORDER BY col1, col2, col3;
This works, because you can run window functions over aggregate functions.
Cast to text
if the data type is not a character type.
Or, all qualifying col3
per (col1, col2)
in a list, plus all matching col4
in a second list:
SELECT col1, col2
, string_agg(col3::text, '/') AS col3_list -- cast if necessary
, string_agg(col4_list, '/') AS col4_list
FROM (
SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list
, rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rnk = 1
GROUP BY col1, col2
ORDER BY col1, col2, col3_list;
Related answers with more explanation:
Solution for Amazon Redshift
row_number()
is available, so this should work:
SELECT col1, col2, col3, col4
FROM (
SELECT col1, col2, col3, min(col4) AS col4
, row_number() OVER (PARTITION BY col1, col2
ORDER BY count(*) DESC, col3) AS rn
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rn = 1
ORDER BY col1, col2;
Or if window functions over aggregate functions are not allowed, use another subquery
SELECT col1, col2, col3, col4
FROM (
SELECT *, row_number() OVER (PARTITION BY col1, col2
ORDER BY ct DESC, col3) AS rn
FROM (
SELECT col1, col2, col3, min(col4) AS col4, COUNT(*) AS ct
FROM tbl
GROUP BY col1, col2, col3
) sub1
) sub2
WHERE rn = 1;
This picks the smallest col3
if more than one tie for the maximum count. And the smallest col4
for the respective col3
.
SQL Fiddle demonstrating all in Postgres 9.3.