1

I have a table in postgres like below

table

I want an sql in postgres that count a combination of 2 columns that has YY

Expecting an output like

Combination Count

AB 2
AC 1
AD 2
AZ 1
BC 1
BD 3
BZ 2
CD 2
CZ 0
DZ 1

Can anyone help me?

Mehman Q
  • 13
  • 2

2 Answers2

2
WITH stacked AS (
    SELECT id
        , unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
        , unnest(array[a, b, c, d, z]) AS col_value
    FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
    SELECT t1.id, t1.col_name || t2.col_name AS combo
        , (CASE WHEN t1.col_value = 'Y' AND t2.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
    FROM stacked t1
    INNER JOIN stacked t2
    ON t1.id = t2.id
    AND t1.col_name < t2.col_name) t3
GROUP BY combo
ORDER BY combo

yields

| combo | count |
|-------+-------|
| AB    |     2 |
| AC    |     1 |
| AD    |     2 |
| AZ    |     2 |
| BC    |     1 |
| BD    |     3 |
| BZ    |     2 |
| CD    |     2 |
| CZ    |     0 |
| DZ    |     1 |

The unnesting recipe for unpivoting the table comes from Stew's post, here.


To count occurrances of YYY among 3 columns you could use:

WITH stacked AS (
    SELECT id
        , unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
        , unnest(array[a, b, c, d, z]) AS col_value
    FROM test t
)
SELECT combo, sum(cnt) AS count
FROM (
    SELECT t1.id, t1.col_name || t2.col_name || t3.col_name AS combo
        , (CASE WHEN t1.col_value = 'Y' 
               AND t2.col_value = 'Y'
               AND t3.col_value = 'Y' THEN 1 ELSE 0 END) AS cnt
    FROM stacked t1
    INNER JOIN stacked t2
    ON t1.id = t2.id
    INNER JOIN stacked t3
    ON t1.id = t3.id
    AND t1.col_name < t2.col_name 
    And t2.col_name < t3.col_name
    ) t3
GROUP BY combo
ORDER BY combo
;

which yields

| combo | count |
|-------+-------|
| ABC   |     0 |
| ABD   |     1 |
| ABZ   |     2 |
| ACD   |     1 |
| ACZ   |     0 |
| ADZ   |     1 |
| BCD   |     1 |
| BCZ   |     0 |
| BDZ   |     1 |
| CDZ   |     0 |

Or, to handle combinations of N columns, you could use WITH RECURSIVE: For example, for N = 3,

WITH RECURSIVE result AS (
    WITH stacked AS (
        SELECT id
            , unnest(array['A', 'B', 'C', 'D', 'Z']) AS col_name
            , unnest(array[a, b, c, d, z]) AS col_value
        FROM test t)
    SELECT id, array[col_name] AS path, array[col_value] AS path_val, col_name AS last_name
    FROM stacked

    UNION

    SELECT r.id, path || s.col_name, path_val || s.col_value, s.col_name
    FROM result r
    INNER JOIN stacked s
    ON r.id = s.id
        AND s.col_name > r.last_name
    WHERE array_length(r.path, 1) < 3)  -- Change 3 to your value for N
SELECT combo, sum(cnt)
FROM (
    SELECT id, array_to_string(path, '') AS combo, (CASE WHEN 'Y' = all(path_val) THEN 1 ELSE 0 END) AS cnt
    FROM result
    WHERE array_length(path, 1) = 3) t  -- Change 3 to your value for N
GROUP BY combo
ORDER BY combo

Note that N = 3 is used in 2 places in the SQL above.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thank you, Ubuntu. This is exactly the thing i wanted. Wonderful. Any idea how to make it for a combination of 3 columns? like YYY. just curious. Thank you in advance! – Mehman Q Feb 20 '19 at 12:04
  • Brilliant !! The case with N works for everything. Thank you ! – Mehman Q Feb 21 '19 at 08:13
0

I would do this using a lateral join:

with vals as (
      select v.*
      from t cross join lateral
           (values ('A', A), ('B', B), ('C', C), ('D', D), ('Z', Z)
           ) v(which, val)
     )
select (v1.which || v2.which) as combo,
       sum( (val = 'Y')::int ) as count
from vals v1 join
     vals v2
     on v1.which < v2.which
group by combo
order by combo;

I consider lateral joins to be a more direct way to unpivot the values. There is no need to convert the values to an array an unnest, much less unnest two arrays and align the values.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786