Finding duplicates based on multiple columns

Question

I need to find all rows which are duplicates across different locations. Table data is as below,

---------------------------------------------------------
| Assetno  |  SerialNo  |   StickerNo  |     location  |
---------------------------------------------------------
| 1234     |   QWR      |   12ERT      |     123       |
| 1234     |   QWR      |   12ERT      |     567       |
| 7888     |   ytu      |   67UI       |     456       |
| 9000     |   UIO      |   OPIO9      |     8         |
---------------------------------------------------------

Like in the above table I need to find rows like row number 1 and 2.

`ROW_NUMBER() OVER(PARTITION BY <> ORDER BY <>`? What did you try, show us your attempt and the desired output — Ilyes, Jun 10 '19 at 21:25
Possible duplicate of [How do I find duplicates across multiple columns?](https://stackoverflow.com/questions/8149210/how-do-i-find-duplicates-across-multiple-columns) — M. Kanarkowski, Jun 10 '19 at 21:30

Gordon Linoff · Accepted Answer · 2019-06-10T22:02:36.867

3

One method uses window functions. So, this works on your sample data:

select t.*
from (select t.*,
             count(*) over (partition by Assetno, SerialNo, StickerNo) as cnt
      from t
     ) t
where cnt >= 2;

I would be more inclined to use exists, though:

select t.*
from t
where exists (select 1
              from t t2
              where t2.Assetno = t.Assetno and
                    t2.SerialNo = t.SerialNo and
                    t2.StickerNo = t.StickerNo and
                    t2.location <> t.location
             );

This more explicitly specifies that the rows with the first three columns the same have different locations. It is probably faster with an index on (Assetno, SerialNo, StickerNo, location) as well.

edited Jun 10 '19 at 22:02

answered Jun 10 '19 at 21:55

Gordon Linoff

1,242,037
58
646
786

1

@RaymondNijland . . . Not at all. The intention is to pick up all duplicate rows, as requested by the OP. – Gordon Linoff Jun 10 '19 at 21:58
*"Not at all. The intention is to pick up all duplicate rows, as requested by the OP. "* indeed your right i missed that point.. Adding the `ORDER BY` as in `count(*) over (partition by Assetno, SerialNo, StickerNo ORDER BY ) as cnt` in this case will add unneeded processing – Raymond Nijland Jun 10 '19 at 22:00
Thank You so much, both the methods worked like a champ. – Abhi Jun 10 '19 at 22:28

Finding duplicates based on multiple columns

1 Answers1