0

Given a table myTable with 4 columns, say Col1, Col2, Col3 and Col4:

A X 5 B
A Y 5 C
A X 7 D
A Y 3 E 
A X 7 F

I need to find most occurring col3 for each pair (col1, col2).

So result for this example will be:

A X 7   D/F  -- D or F
A Y 5/3 C/E  -- It can be 5 and C or 3 and E

So I wrote a query something like this :

select Col1,Col2,Col3 
from myTable M 
group by Col1,Col2,Col3 
having Col3 =
     (select Col3 
      from myTable N 
      where M.Col1=N.col1 
      group by Col3 
      order by Col3 desc limit 1); 

But the query does not give the desired result.
Also I have no idea how to get Col4 as am making group by clause, and I don't want to make group by according to Col4.

For each (Col1, Col2) pair I want single Col4 that goes along with the maximum occurring Col3.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
GitCoder
  • 121
  • 2
  • 13

2 Answers2

1

You only need a single subquery with DISTINCT ON:

SELECT DISTINCT ON (col1, col2)
       col1, col2, col3, min(col4) As col4
FROM   tbl
GROUP  BY col1, col2, col3
ORDER  BY col1, col2, count(*) DESC, col3;

This way you get a single row per (col1, col2) with the most common col3 (the samllest value if multiple tie for "most common") an the smallest col4 to go along with that col3.


Similarly, to get all qualifying col3 you can use the window function rank() in a subquery, which is also executed after the aggregation:

SELECT col1, col2, col3, col4_list
FROM  (
   SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list
        , rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk
   FROM   tbl
   GROUP  BY col1, col2, col3
   ) sub
WHERE  rnk = 1
ORDER  BY col1, col2, col3;

This works, because you can run window functions over aggregate functions.
Cast to text if the data type is not a character type.


Or, all qualifying col3 per (col1, col2) in a list, plus all matching col4 in a second list:

SELECT col1, col2
     , string_agg(col3::text, '/') AS col3_list  -- cast if necessary
     , string_agg(col4_list,  '/') AS col4_list
FROM  (
   SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list
        , rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk
   FROM   tbl
   GROUP  BY col1, col2, col3
   ) sub
WHERE  rnk = 1
GROUP  BY col1, col2
ORDER  BY col1, col2, col3_list;

Related answers with more explanation:

Solution for Amazon Redshift

row_number() is available, so this should work:

SELECT col1, col2, col3, col4
FROM  (
   SELECT col1, col2, col3, min(col4) AS col4
        , row_number() OVER (PARTITION BY col1, col2
                             ORDER BY count(*) DESC, col3) AS rn
   FROM   tbl
   GROUP  BY col1, col2, col3
   ) sub
WHERE  rn = 1
ORDER  BY col1, col2;

Or if window functions over aggregate functions are not allowed, use another subquery

SELECT col1, col2, col3, col4
FROM  (
   SELECT *, row_number() OVER (PARTITION BY col1, col2
                                ORDER BY ct DESC, col3) AS rn
   FROM (
      SELECT col1, col2, col3, min(col4) AS col4, COUNT(*) AS ct
      FROM   tbl
      GROUP  BY col1, col2, col3
      ) sub1
   ) sub2
WHERE  rn = 1;

This picks the smallest col3 if more than one tie for the maximum count. And the smallest col4 for the respective col3.

SQL Fiddle demonstrating all in Postgres 9.3.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

One way of doing this is with the row_number() window function on top of the aggregate query:

SELECT col1, col2, col3
FROM   (SELECT col1, col2, col3, 
        ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY cnt DESC) AS rn
        FROM (SELECT   col1, col2, col3, COUNT(*) AS cnt
              FROM     mytable
              GROUP BY col1, col2, col3) t
       ) q
WHERE  rn = 1
Mureinik
  • 297,002
  • 52
  • 306
  • 350
  • ERROR: syntax error at or near "PARTITION" LINE 3: row_number (PARTITION BY comp_sku, zone_id ORDER BY ...What can be the reason ? – GitCoder Sep 22 '15 at 22:41
  • @GitCoder the reason is I wasn't paying attention and completely forgot the `over` clause. Edited and fixed, try now. – Mureinik Sep 22 '15 at 22:44
  • Suppose if we have one more column Col4, and want in result, then how to get this also. Because we dont want it in group by clause – GitCoder Sep 22 '15 at 23:36
  • Like for any Col1,Col2 pair any Col4 matching the criteria will work out if there are multiple options. – GitCoder Sep 23 '15 at 00:27