How to count the number of times two values appear in two columns in any order

Question

Lets say, we have this table :

+------+------+
| COL1 | COL2 |
+------+------+
|   A  |   B  |
+------+------+
|   B  |   A  |
+------+------+
|   C  |   D  |
+------+------+

I want to count the number of times either letter1, letter2 or letter2, letter1 appears in the two columns.

I want the result :

+------+------+------+
| COL1 | COL2 | COL3 |
+------+------+------+
|   A  |   B  |   2  | 
+------+------+------+
|   C  |   D  |   1  |
+------+------+------+

NOTE: It can be either AB or BA doesn't matter.

I tried:

SELECT
COL1,COL1,COUNT(*) AS COL3
FROM
X
GROUP BY COL1,COL2;

But that gets me :

+------+------+------+
| COL1 | COL2 | COL3 |
+------+------+------+
|   A  |   B  |   1  |
+------+------+------+
|   B  |   A  |   1  |
+------+------+------+
|   C  |   D  |   1  |
+------+------+------+

Something similar is achieved with a "covariance matrix." However the results would still have to be merged - respectively added up. — AnyOneElse, Jul 01 '15 at 10:12

score 15 · Accepted Answer · edited Jul 01 '15 at 06:52

15

You can do this by swapping the columns if you need to:

SELECT Col1, Col2, COUNT(*)
FROM
(
    SELECT
        CASE WHEN Col1 < Col2 THEN Col1 ELSE Col2 END AS Col1,
        CASE WHEN Col1 < Col2 THEN Col2 ELSE Col1 END AS Col2
    FROM T
) t
GROUP BY Col1, Col2

Fiddle

edited Jul 01 '15 at 06:52

Tim Biegeleisen

502,043
27
286
360

answered Jul 01 '15 at 06:49

Damien_The_Unbeliever

234,701
27
340
448

6

You can make it more elegant using `LEAST`, `GREATEST` and getting rid of the outer query, like [this](http://sqlfiddle.com/#!9/7c4f77/3). +1 for a nice query though. – Giorgos Betsos Jul 01 '15 at 06:57
2

@GiorgosBetsos - I prefer to write SQL that works across all platforms, where possible. – Damien_The_Unbeliever Jul 01 '15 at 06:58

Akhil · Answer 2 · 2015-07-01T07:30:54.033

3

Another try

SELECT LEAST(col1, col2) col11, GREATEST(col1, col2) col12 , COUNT(1) FROM X
GROUP BY col11, col12

SqlFiddle

edited Jul 01 '15 at 07:30

answered Jul 01 '15 at 07:18

Akhil

2,602
23
36

1

I like this answer the best. It's much shorter and more readable. – AdamMc331 Jul 01 '15 at 18:29

score 0 · Answer 3 · edited May 23 '17 at 10:26

0

UPDATE: : Use @Damien's answer. Another try.

You can try the below code. Fiddle

   SELECT COL1, COL2,  COUNT(*) AS COL3
   FROM (
    SELECT
    LEAST(COL1,COL2) AS COL1,
    GREATEST(COL1,COL2)  AS COL2
    FROM X
     ) AS Temp
    GROUP BY COL1,COL2;

edited May 23 '17 at 10:26

Community

1
1

answered Jul 01 '15 at 06:47

sudhansu63

6,025
4
39
52

1

CTE is SQL Server specific. https://msdn.microsoft.com/en-us/library/ms190766.aspx – rurouni88 Jul 01 '15 at 06:52
1

@rurouni88 - it's not SQL Server specific. It's standard SQL (Oracle and Postgre also support it). It's just the MySQL doesn't follow the standard here. – Damien_The_Unbeliever Jul 01 '15 at 06:55
@Damien_The_Unbeliever: Fair enough. At any rate, it's not supported here. Let's not get me off on a rant. I spent the past week looking at different DB platforms different implementation of RAND() with SEED() :P – rurouni88 Jul 01 '15 at 06:58
1

Your query seems wrong if OP has different data. See this [**fiddle**](http://www.sqlfiddle.com/#!9/b0e7e/3) – Raging Bull Jul 01 '15 at 07:04
@RagingBull you are right, i misinterpreted the question – sudhansu63 Jul 01 '15 at 07:15
2

you need `LEAST()` and `GREATEST()`, not MIN and MAX. – ypercubeᵀᴹ Jul 01 '15 at 08:26
1

@sudhAnsu63 You may need to give different alias for your fields. Currently your query is not giving desired result. Check your fiddle itself – Akhil Jul 01 '15 at 09:33
1

`GROUP BY COL1, COL2` prematurely creates duplicate rows (e.g. one for `a, b` and another for `b, a`). – Salman A Jul 01 '15 at 11:39

score -1 · Answer 4 · answered Jul 01 '15 at 07:15

-1

See http://sqlfiddle.com/#!9/4bd6a/23

Use if statements and concat the 2 columns.

SELECT
  DISTINCT (CONCAT(C1,C2)) AS permutation, COUNT(1)
  FROM (SELECT
    IF(col1<=col2, col1, col2) as C1,
    IF(col2<col1, col1, col2) as C2
  FROM X) AS T
  GROUP BY permutation
;

Further explanation: The if statements merely order the characters by ASCII value, so regardless of 'AB' or 'BA', it will always be represented as 'AB'

answered Jul 01 '15 at 07:15

rurouni88

1,165
5
10

That may work with one character values, but if the real values were 'AA' and 'BB' this would falsely match 'AAB' and 'B'. – abligh Jul 01 '15 at 08:21

How to count the number of times two values appear in two columns in any order

4 Answers4

Linked