how can i find duplicates across multiple columns

Question

I am looking for records where there are duplicate values in multiple columns. For example:

| id  |column1 | column2|  column3  | column4|  
| 1   | 1      | main   |  101      |   234  |
| 2   | 1      | main   |  101      |   234  |
| 3   | 2      | main   |  101      |   234  |
| 4   | 2      | main   |  103      |   234  |

I want to look where there is a match on column1, column2, column3, and column4. I don't want a match on the id. The following should be returned:

| id  |column1 | column2|  column3  | column4|  
| 1   | 1      | main   |  101      |   234  |
| 2   | 1      | main   |  101      |   234  |

The other 2 would not be returned because they did not match on one of the columns.

You want a match for _all columns_ or you want a match for _1 or more columns_? — GrumpyCrouton, Jan 31 '18 at 19:35
for the 4 columns; basically i am trying to find where there are 2 or more instances where those 4 columns match — user7002207, Jan 31 '18 at 19:37
This may have duplicates, but the stated duplicate *did not keep the original rows*. Apparently, that is important for this question. — Gordon Linoff, Jan 31 '18 at 19:44

score 1 · Answer 1 · answered Jan 31 '18 at 19:42

1

You can use GROUP BY and COUNT to identify duplicates:

SELECT COUNT(id) as duplicates
,column1 
,column2
,column3
,column4
FROM mytable
GROUP BY column1  , column2 , column3 , column4
HAVING COUNT(id) > 1;

answered Jan 31 '18 at 19:42

Greg Viers

3,473
3
18
36

1

Although this finds duplicates, it does not return the *original* rows, which appears to be the intention of this question. – Gordon Linoff Jan 31 '18 at 19:45

score 0 · Answer 2 · answered Jan 31 '18 at 19:44

One method uses exists:

select t.*
from t
where exists (select 1
              from t t2
              where t.column1 = t2.column1 and
                    t.column2 = t2.column2 and
                    t.column3 = t2.column3 and
                    t.column4 = t2.column4
             );

Note that this does not work if any of the columns are NULL.

You can handle NULL values by using window functionsL

select t.*
from (select t.*,
             count(*) over (partition by column1, column2, column3, column4) as cnt
      from t
     ) t
where cnt > 1
order by column1, column2, column3, column4;

score 0 · Answer 3 · answered Jan 31 '18 at 19:44

Try something like this, I had no way of testing this. So i am not 100% if this will work.

Select Id,column1, column2,column3,column4  
From Table 1 t1
Join (
       select column1, column2,column3,column4 From table 1
       Group by column1, column2,column3,column4 
        having Count(*) >1 
     ) dup on dup.column1 = t1.column1 and 
              dup.column2 = t1.column2 and 
              dup.column3 = t1.column3 and 
              dup.column4 = t1.column4

this should show you all the ids that are with dup columns

score 0 · Answer 4 · answered Jan 31 '18 at 20:13

Without sub-queries and exists.

SELECT
    t1.id, 
    t1.column1, 
    t1.column2, 
    t1.column3
FROM
    table t1
LEFT JOIN
    table t2 ON 
    t1.column1 = t2.column1 AND 
    t1.column2 = t2.column2 AND 
    t1.column3 = t2.column3 AND 
    t1.id <> t2.id
WHERE
    t2.id IS NOT NULL
GROUP BY
    t1.id, t1.column1, t1.column2, t1.column3

how can i find duplicates across multiple columns

4 Answers4