138

So I want to do something like this sql code below:

select s.id, s.name,s.city 
from stuff s
group by s.name having count(where city and name are identical) > 1

To produce the following, (but ignore where only name or only city match, it has to be on both columns):

id      name  city   
904834  jim   London  
904835  jim   London  
90145   Fred  Paris   
90132   Fred  Paris
90133   Fred  Paris
Logica
  • 977
  • 4
  • 16
NimChimpsky
  • 46,453
  • 60
  • 198
  • 311

10 Answers10

179

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city
Michał Powaga
  • 22,561
  • 8
  • 51
  • 62
  • Note that if either ```name``` or ```city``` contain ```null```, then they will fail to be reported in the outer query, but will be matched in the inner query. – Adam Parkin Jan 13 '15 at 23:52
  • 5
    If the values can possibly contain ```null``` then (unless I'm missing something) you need to change it to a ```CROSS JOIN``` (full Cartesian product) and then add a ```WHERE``` clause such as: ```WHERE ((s.name = t.name) OR (s.name is null and t.name is null)) AND ((s.city = t.city) OR (s.city is null and t.city is null))``` – Adam Parkin Jan 14 '15 at 16:03
  • This answer will not return unique IDs of each duplicated record. Instead, it will merge the duplicated records, into a single record, and choose whichever ID appears first in the table. I believe the answer by @ssarabando is a more appropriate answer. – Crayons Aug 17 '21 at 20:45
  • 1
    @Crayons Both answers will return the same results. See https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=8bf900c22bbb263ef792f73e8355cdc2 and https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=8e94833cb4dd27a1974a1a362ff2a11c – Jack B Dec 08 '21 at 14:59
114
 SELECT name, city, count(*) as qty 
 FROM stuff 
 GROUP BY name, city HAVING count(*)> 1
Plamen G
  • 4,729
  • 4
  • 33
  • 44
Sunnny
  • 1,173
  • 1
  • 7
  • 3
26

Something like this will do the trick. Don't know about performance, so do make some tests.

select
  id, name, city
from
  [stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)
ssarabando
  • 3,397
  • 2
  • 36
  • 42
  • 1
    This is an under-rated answer and I believe it's the best one here. This answer identifies duplicates, while returning individual records and their unique ID's. The marked answer groups the results, meaning you cannot actually identify the duplicates by their unique ID's, and is therefore a less useful dataset. – Crayons Aug 17 '21 at 20:48
  • 1
    @Crayons Both answers will return the same results. See https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=8bf900c22bbb263ef792f73e8355cdc2 and https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=8e94833cb4dd27a1974a1a362ff2a11c – Jack B Dec 08 '21 at 15:00
9

Using count(*) over(partition by...) provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

While most recent RDBMS versions support count(*) over(partition by...) MySQL V 8.0 introduced "window functions", as seen below (in MySQL 8.0)

CREATE TABLE stuff(
   id   INTEGER  NOT NULL
  ,name VARCHAR(60) NOT NULL
  ,city VARCHAR(60) NOT NULL
);
INSERT INTO stuff(id,name,city) VALUES 
  (904834,'jim','London')
, (904835,'jim','London')
, (90145,'Fred','Paris')
, (90132,'Fred','Paris')
, (90133,'Fred','Paris')

, (923457,'Barney','New York') # not expected in result
;
SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city
    id | name | city   | qty
-----: | :--- | :----- | --:
 90145 | Fred | Paris  |   3
 90132 | Fred | Paris  |   3
 90133 | Fred | Paris  |   3
904834 | jim  | London |   2
904835 | jim  | London |   2

db<>fiddle here

Window functions. MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG(). For more information, see Section 12.21, “Window Functions”.

Community
  • 1
  • 1
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
7

I found this way to be pretty flexible / efficient

select 
    s1.id
    ,s1.name
    ,s1.city 
from 
    stuff s1
    ,stuff s2
Where
    s1.id <> s2.id
    and s1.name = s2.name
    and s1.city = s2.city
TylerH
  • 20,799
  • 66
  • 75
  • 101
MattD
  • 71
  • 1
  • 1
5
SELECT Feild1, Feild2, COUNT(*)
FROM table name
GROUP BY Feild1, Feild2
HAVING COUNT(*)>1

This will give you all your answers.

TylerH
  • 20,799
  • 66
  • 75
  • 101
2

You have to self join stuff and match name and city. Then group by count.

select 
   s.id, s.name, s.city 
from stuff s join stuff p ON (
   s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1
Anja
  • 324
  • 1
  • 2
1

From OP question, OP wants to group columns and get additional columns that aren't grouping columns.

so that regular group by + having might not be worked.

I would use EXISTS subquery with HAVING.

we can try to add columns which you want to mark duplicate in a subquery.

SELECT s.id, s.name,s.city 
FROM stuff s
WHERE EXISTS (
   SELECT 1
   FROM stuff ss
   WHERE 
      s.name = ss.name
   AND
      s.city = ss.city
   GROUP BY ss.name,ss.city
   HAVING COUNT(*) > 1
)

If we create a suitable Index might get better performance than join

CREATE INDEX IX_name ON stuff (
    name,
    city
);

Another way we can use COUNT window function with filter condition to make it which add grouping columns in PARTITION BY part

SELECT s.id, s.name,s.city 
FROM (
   SELECT *,COUNT(*) OVER(PARTITION BY name,city) cnt
   FROM stuff 
) s
WHERE cnt > 1

sqlfiddle

D-Shih
  • 44,943
  • 6
  • 31
  • 51
1

It's a pleasure to add another way of achieving the required output using Cross Apply here like below :

select s.* from stuff s
cross apply(
    select name, city from stuff
    group by name, city
    having Count(*) > 1) x
where s.name = x.name and s.city=x.city
Md. Suman Kabir
  • 5,243
  • 5
  • 25
  • 43
-1

Given a staging table with 70 columns and only 4 representing duplicates, this code will return the offending columns:

SELECT 
    COUNT(*)
    ,LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
FROM Staging.dbo.Stage S
GROUP BY 
    LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
HAVING COUNT(*) > 1

.

Georges Legros
  • 2,494
  • 2
  • 23
  • 42
Don G.
  • 1