Deleting duplicate rows, no unique keys - relational table

Question

I have a relational table that connects two other tables based on their IDs. There can be duplicates for both columns - but there CANNOT be the same row twice. I handle the checking code side.

How do I remove duplicate rows (see below):

select * from people:

a | b
1   2
1   3
1   3
1   7
2   3
2   5
2   5
2   9

I want the result to be:

Check this Post - http://stackoverflow.com/questions/2630440/how-to-delete-duplicates-on-mysql-table — rs., Oct 12 '12 at 19:15
Create another (temporary) table with a `UNIQUE` key, then insert over and it should ignore duplicates. — Kermit, Oct 12 '12 at 19:16
+1 to @njk. I guess the lesson here is never to create a table without a unique key, preferably an auto-increment ID field. — Spudley, Oct 12 '12 at 19:18
The table isn't an auto increment. It's just a relationship table between two other tables. So technically I'm allowed duplicates in either column, but the combination of the two columns (row) CANNOT exist twice. — user82302124, Oct 12 '12 at 19:29
@user82302124, can you provide the actual table structures? In your answer, I don't see what 'two columns' you just referred to. — Ross Smith II, Oct 12 '12 at 19:39
I think @RossSmithII is asking for the structures of the tables that `people` establishes relationship between. — Miserable Variable, Oct 12 '12 at 21:31

Ross Smith II · Answer 1 · 2012-10-12T21:24:06.320

3

This should work:

ALTER IGNORE TABLE people ADD UNIQUE (a,b);

If you don't want to add an index, then this should work:

DROP TABLE IF EXISTS people_old;
DROP TABLE IF EXISTS people_new;
CREATE TABLE people_new LIKE people;
INSERT INTO people_new SELECT DISTINCT * FROM people;
RENAME TABLE people TO people_old, people_new TO people;

edited Oct 12 '12 at 21:24

answered Oct 12 '12 at 19:19

Ross Smith II

11,799
1
38
43

I'm not sure if this is what I want with my example - I'm not looking to make any column unique? – user82302124 Oct 12 '12 at 20:42
Is the concern about adding an index, that by doing so, the code that inserts the records will fails when inserting the duplicates? You could solve this by changing the `INSERT`s to `INSERT IGNORE`s. – Ross Smith II Oct 12 '12 at 21:26

score 1 · Answer 2 · answered Oct 12 '12 at 19:22

This is how you can delete duplicate rows... I'll write you my example and you'll need to apply to your code. I have Actors table with ID and I want to delete the rows with repeated first_name

mysql> select actor_id, first_name from actor_2;
+----------+-------------+
| actor_id | first_name  |
+----------+-------------+
|        1 | PENELOPE    |
|        2 | NICK        |
|        3 | ED          |
....
|      199 | JULIA       |
|      200 | THORA       |
+----------+-------------+

200 rows in set (0.00 sec)

-Now I use a Variable called @a to get the ID if the next row have the same first_name(repeated, null if it's not).

mysql> select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name;
+---------------+----------------+
|  first_names  | @a:=first_name |
+---------------+----------------+
|          NULL | ADAM           |
|            71 | ADAM           |
|          NULL | AL             |
|          NULL | ALAN           |
|          NULL | ALBERT         |
|           125 | ALBERT         |
|          NULL | ALEC           |
|          NULL | ANGELA         |
|           144 | ANGELA         |
...
|          NULL | WILL           |
|          NULL | WILLIAM        |
|          NULL | WOODY          |
|            28 | WOODY          |
|          NULL | ZERO           |
+---------------+----------------+
200 rows in set (0.00 sec)

-Now we can get only duplicates ID:

mysql> select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1;
+-------------+
| first_names |
+-------------+
|        NULL |
|          71 |
|        NULL |
 ...
|          28 |
|        NULL |
+-------------+
200 rows in set (0.00 sec)

-the Final Step, Lets DELETE!

mysql> delete from actor_2 where actor_id in (select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1);
Query OK, 72 rows affected (0.01 sec)

-Now lets check our table:

mysql> select count(*) from actor_2 group by first_name;
+----------+
| count(*) |
+----------+
|        1 |
|        1 |
|        1 |
...
|        1 |
+----------+
128 rows in set (0.00 sec)

it works, if you have any question write me back

score 1 · Accepted Answer · answered Oct 12 '12 at 21:34

There can be duplicates for both columns - but there CANNOT be the same row twice

That's a constraint on the table that you have not implemented. The constraint is a unique index on (a,b). If you had the index you would not have duplicates.

IMHO your best approach is to add the unique index to the table, using a temporary table to first remove the duplicates:

Copy person to person_temp
Delete all from person
Add unique index to person
Copy unique a,b from person_temp to `person.

Thanks - Trying this now. Pretty sure this is what I need to do. — user82302124, Oct 12 '12 at 22:19

Deleting duplicate rows, no unique keys - relational table

3 Answers3