SQL Select column which is not used in select section of subquery which find duplicates

Question

I am trying to find in my database records which has duplicated fields like name, surname and type.

Example:

SELECT name, surname, type, COUNT(*)
FROM customers
GROUP BY name, surname
HAVING COUNT(*)>1

Query results:

Robb|Stark|1|2

Tyrion|Lannister|1|3

So we have duplicated customer with name and surname "Robb Stark" 2 times and "Tyrion Lannister" 3 times Now, I want to know the id of these records. I found similar problem described here: Finding duplicate values in a SQL table there is answer but no example.

"I want to know the id of these records." Your question has no sample data showing what the "id" is. Your query is also malformed and will return an error in SQL Server, because `type` is not in the `GROUP BY`. — Gordon Linoff, Dec 21 '20 at 12:40

score 1 · Accepted Answer · answered Dec 21 '20 at 09:15

1

Use COUNT as an analytic function:

WITH cte AS (
    SELECT *, COUNT(*) OVER (PARTITION BY name, surname) cnt
    FROM customers
)

SELECT *                -- return all columns
FROM cte
WHERE cnt > 1
ORDER BY name, surname;

answered Dec 21 '20 at 09:15

Tim Biegeleisen

502,043
27
286
360

thank you. Is there also possible to show results like: Robb Stark|454,854 where 454 and 854 are ids? – maniootek Dec 21 '20 at 09:40
Well you could also include `GROUP_CONCAT(id) OVER (PARTITION BY name, surname) ids` inside the CTE. But now you really seem to be asking for 2 different things. If you only want aggregates, then stick with your current `GROUP BY` query. If you want to retain every record, then use my answer. – Tim Biegeleisen Dec 21 '20 at 09:46

score 0 · Answer 2 · answered Dec 21 '20 at 09:25

The simplest way will be to use the EXISTS as follows:

SELECT t.*
  FROM customers t
 where exists 
  (select 1 from customers tt
    where tt.name = t.name
      and tt.surname = t.surname
      and tt.id <> t.id)

Or use your original query in IN clause as follows:

select * from customers where (name, surname) in
(SELECT name, surname
FROM customers
GROUP BY name, surname
HAVING COUNT(*)>1)

GMB · Answer 3 · 2020-12-21T15:50:40.890

0

If you want one row per group of duplicate, with the list of id in a comma separated string, you can just use string aggration with your existing query:

SELECT name, surname, COUNT(*) as cnt,
    STRING_AGG(id, ',') WITHIN GROUP (ORDER BY id) as all_ids
FROM customers
GROUP BY name, surname
HAVING COUNT(*) > 1

edited Dec 21 '20 at 15:50

answered Dec 21 '20 at 10:02

GMB

216,147
25
84
135

I think my SQL does not support GROUP_CONCAT() and JSON_ARRAYAGG() functions – maniootek Dec 21 '20 at 10:25
@maniootek: I was under the impression that you are running MySQL, but it seems not. What database are you running? Which error do you get when you run my queries? – GMB Dec 21 '20 at 10:26
I use Microsoft SQL database, not MySQL – maniootek Dec 21 '20 at 10:34
Query works but only if I delete second line (all_ids select statement). When I leave it, then I got error `Incorrect syntax near '('.` – maniootek Dec 21 '20 at 11:17
@maniootek: `string_agg()` was added in SQL Server 2017... Maybe you are running an older version? – GMB Dec 21 '20 at 17:46

SQL Select column which is not used in select section of subquery which find duplicates

3 Answers3