0

I am trying to find in my database records which has duplicated fields like name, surname and type.

Example:

SELECT name, surname, type, COUNT(*)
FROM customers
GROUP BY name, surname
HAVING COUNT(*)>1

Query results:

Robb|Stark|1|2

Tyrion|Lannister|1|3

So we have duplicated customer with name and surname "Robb Stark" 2 times and "Tyrion Lannister" 3 times Now, I want to know the id of these records. I found similar problem described here: Finding duplicate values in a SQL table there is answer but no example.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
maniootek
  • 409
  • 1
  • 8
  • 15
  • "I want to know the id of these records." Your question has no sample data showing what the "id" is. Your query is also malformed and will return an error in SQL Server, because `type` is not in the `GROUP BY`. – Gordon Linoff Dec 21 '20 at 12:40

3 Answers3

1

Use COUNT as an analytic function:

WITH cte AS (
    SELECT *, COUNT(*) OVER (PARTITION BY name, surname) cnt
    FROM customers
)

SELECT *                -- return all columns
FROM cte
WHERE cnt > 1
ORDER BY name, surname;
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • thank you. Is there also possible to show results like: Robb Stark|454,854 where 454 and 854 are ids? – maniootek Dec 21 '20 at 09:40
  • Well you could also include `GROUP_CONCAT(id) OVER (PARTITION BY name, surname) ids` inside the CTE. But now you really seem to be asking for 2 different things. If you only want aggregates, then stick with your current `GROUP BY` query. If you want to retain every record, then use my answer. – Tim Biegeleisen Dec 21 '20 at 09:46
0

The simplest way will be to use the EXISTS as follows:

SELECT t.*
  FROM customers t
 where exists 
  (select 1 from customers tt
    where tt.name = t.name
      and tt.surname = t.surname
      and tt.id <> t.id)

Or use your original query in IN clause as follows:

select * from customers where (name, surname) in
(SELECT name, surname
FROM customers
GROUP BY name, surname
HAVING COUNT(*)>1)
Popeye
  • 35,427
  • 4
  • 10
  • 31
0

If you want one row per group of duplicate, with the list of id in a comma separated string, you can just use string aggration with your existing query:

SELECT name, surname, COUNT(*) as cnt,
    STRING_AGG(id, ',') WITHIN GROUP (ORDER BY id) as all_ids
FROM customers
GROUP BY name, surname
HAVING COUNT(*) > 1
GMB
  • 216,147
  • 25
  • 84
  • 135
  • I think my SQL does not support GROUP_CONCAT() and JSON_ARRAYAGG() functions – maniootek Dec 21 '20 at 10:25
  • @maniootek: I was under the impression that you are running MySQL, but it seems not. What database are you running? Which error do you get when you run my queries? – GMB Dec 21 '20 at 10:26
  • I use Microsoft SQL database, not MySQL – maniootek Dec 21 '20 at 10:34
  • Query works but only if I delete second line (all_ids select statement). When I leave it, then I got error `Incorrect syntax near '('.` – maniootek Dec 21 '20 at 11:17
  • @maniootek: `string_agg()` was added in SQL Server 2017... Maybe you are running an older version? – GMB Dec 21 '20 at 17:46