7

How can I update this to select all duplicates?

SELECT address FROM list  
GROUP BY address HAVING count(id) > 1

Currently, I think it just returs the addresses which are duplciated. I want all duplicates.

  • possible duplicate of [mysql INNER JOIN syntax](http://stackoverflow.com/questions/7655922/mysql-inner-join-syntax) – Jonathan Leffler Oct 05 '11 at 02:09
  • If there are 5 duplicates will it return a row for each? –  Oct 05 '11 at 02:10
  • Even though not an exact duplicate, this is asking a subset of the problem asked by the other question: [MySQL INNER JOIN Syntax](http://stackoverflow.com/questions/7655922/mysql-inner-join-syntax). – Jonathan Leffler Oct 05 '11 at 02:10
  • Your question is not very clear. What 'duplicates' are you wanting to select? Are you looking for specific columns to be returned? Also, the table definition/schema may be helpful. – Adam Wagner Oct 05 '11 at 02:10
  • No - it will return one copy of the address for each duplicated address. The GROUP BY ensures that similar values are collected together; the HAVING ensure that only repeated addresses are listed. – Jonathan Leffler Oct 05 '11 at 02:11

3 Answers3

11
Select * from list
where address in (
  select address from list group by address
  having count(*) > 1);

Look at this sample query I ran:

mysql> select * from flights;
+--------+-------------+
| source | destination |
+--------+-------------+
|      1 |           2 |
|      3 |           4 |
|      5 |           6 |
|      6 |           1 |
|      2 |           4 |
|      1 |           3 |
|      5 |           2 |
|      6 |           3 |
|      6 |           5 |
|      6 |           4 |
+--------+-------------+
10 rows in set (0.00 sec)

mysql> select * from flights where source in 
       (select source from flights group by source having count(*) > 1);
+--------+-------------+
| source | destination |
+--------+-------------+
|      1 |           2 |
|      5 |           6 |
|      6 |           1 |
|      1 |           3 |
|      5 |           2 |
|      6 |           3 |
|      6 |           5 |
|      6 |           4 |
+--------+-------------+
8 rows in set (0.00 sec)
varunl
  • 19,499
  • 5
  • 29
  • 47
  • what if ther where say 3 X addresses...would I get two rows returned?..or just the first duplciate? –  Oct 05 '11 at 02:18
  • No you will just get one row returned... Let me add that case too in the sample query I wrote. – varunl Oct 05 '11 at 02:19
  • I am still not sure what you need. It will be great if you can add a sample table and your desired output. – varunl Oct 05 '11 at 02:22
  • they have it here...but there are like 5 answers..which one? –  Oct 05 '11 at 02:23
  • you have to INNER JOIN or something –  Oct 05 '11 at 02:23
  • Now I understand what you need, give a min to check that. – varunl Oct 05 '11 at 02:25
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/4008/discussion-between-topcoder-and-chris-aaker) – varunl Oct 05 '11 at 02:26
4

If I'm correct, you're looking for the actual rows that contain duplicates -- so that if you have three rows with the same address, you return all three rows.

Here's how to do it:

SELECT * FROM list
WHERE address in (
    SELECT address FROM list GROUP BY address HAVING count(id) > 1
);

This should generally work unless your address is a 'text' field or if your address table has more than a few thousand duplicates.

Kevin Bedell
  • 13,254
  • 10
  • 78
  • 114
0

Are you looking for this?

SELECT * FROM list
WHERE id IN (
    SELECT id FROM list
    GROUP BY address HAVING count(id) > 1
);
bfavaretto
  • 71,580
  • 16
  • 111
  • 150
  • Since you're grouping by address, I believe this will pull back only a single row for each address -- instead of all rows for the address. The `count(id)` in your sub-select I believe is a clue that there is more than one table row aggregated into each row of the sub-select. – Kevin Bedell Oct 05 '11 at 02:37
  • @KevinBedell, you are absolutely right, I figured a subselect was needed, and just didn't think much when I actually wrote it. Your solution is the correct one. – bfavaretto Oct 05 '11 at 02:48