How do I locate duplicate records before I create a UNIQUE key?

Question

I'm going to create a UNIQUE key in a large, old table (several hundred thousand rows) to enforce each row containing unique data. I can't just up and create the index though, because the table already contains some duplicates. I don't know how many duplicates it contains, or if the duplicates are necessary. In short, I can't analyse the problem before I can come up with a query that selects all the rows that would violate the future UNIQUE key.

Say I have a table with the fields col_one, col_two, col_three and col_four. My future UNIQUE key will contain col_two and col_three. How do I select all the rows in the table where col_two and col_three has a duplicate in the same table?

You can scroll through this to find duplicate records in a column. http://stackoverflow.com/questions/854128/find-duplicate-records-in-mysql — Dead Man, Mar 18 '13 at 08:28

Matt Gibson · Answer 1 · 2013-03-19T08:25:48.327

4

Use a group.

SELECT 
  col_two,
  col_three,
  COUNT (*) AS `num_dupes`
FROM
  your_table
GROUP BY
  col_two,
  col_three
HAVING 
  COUNT(*) > 1 -- Only return results where there is at least one duplicate.

If you want to see all results from your table where duplicates exist, you can just join from these results to the table:

SELECT
  your_table.*
FROM
  your_table
    INNER JOIN 
    (SELECT 
      col_two,
      col_three,
      COUNT (*) AS `num_dupes`
    FROM
      your_table
    GROUP BY
      col_two,
      col_three
    HAVING 
      COUNT(*) > 1) dupes 
      ON your_table.col_two = dupes.col_two AND
         your_table.col_three = dupes.col_three

edited Mar 19 '13 at 08:25

answered Mar 18 '13 at 08:32

Matt Gibson

37,886
9
99
128

Would it be possible to select all the duplicate rows without grouping them? In case I want to examine the difference in columns that aren't included in the UNIQUE key. Perhaps I would need to include a programming language at this point. – Hubro Mar 19 '13 at 05:51
You can quite easily do this by chaining on another query based on the results of this query. I'll add that to my answer. – Matt Gibson Mar 19 '13 at 08:19

How do I locate duplicate records before I create a UNIQUE key?

1 Answers1