1

I have a relational table that connects two other tables based on their IDs. There can be duplicates for both columns - but there CANNOT be the same row twice. I handle the checking code side.

How do I remove duplicate rows (see below):

select * from people:

a | b
1   2
1   3
1   3
1   7
2   3
2   5
2   5
2   9

I want the result to be:

a | b
1   2
1   3
1   7
2   3
2   5
2   9
user82302124
  • 1,143
  • 5
  • 32
  • 57
  • Check this Post - http://stackoverflow.com/questions/2630440/how-to-delete-duplicates-on-mysql-table – rs. Oct 12 '12 at 19:15
  • 3
    Create another (temporary) table with a `UNIQUE` key, then insert over and it should ignore duplicates. – Kermit Oct 12 '12 at 19:16
  • 1
    +1 to @njk. I guess the lesson here is never to create a table without a unique key, preferably an auto-increment ID field. – Spudley Oct 12 '12 at 19:18
  • The table isn't an auto increment. It's just a relationship table between two other tables. So technically I'm allowed duplicates in either column, but the combination of the two columns (row) CANNOT exist twice. – user82302124 Oct 12 '12 at 19:29
  • @user82302124, can you provide the actual table structures? In your answer, I don't see what 'two columns' you just referred to. – Ross Smith II Oct 12 '12 at 19:39
  • Weird - I swore I had it originally.... Added table example – user82302124 Oct 12 '12 at 19:44
  • I've updated my post - it's more clear about my directive. – user82302124 Oct 12 '12 at 21:17
  • I think @RossSmithII is asking for the structures of the tables that `people` establishes relationship between. – Miserable Variable Oct 12 '12 at 21:31

3 Answers3

3

This should work:

ALTER IGNORE TABLE people ADD UNIQUE (a,b);

If you don't want to add an index, then this should work:

DROP TABLE IF EXISTS people_old;
DROP TABLE IF EXISTS people_new;
CREATE TABLE people_new LIKE people;
INSERT INTO people_new SELECT DISTINCT * FROM people;
RENAME TABLE people TO people_old, people_new TO people;
Ross Smith II
  • 11,799
  • 1
  • 38
  • 43
  • I'm not sure if this is what I want with my example - I'm not looking to make any column unique? – user82302124 Oct 12 '12 at 20:42
  • Is the concern about adding an index, that by doing so, the code that inserts the records will fails when inserting the duplicates? You could solve this by changing the `INSERT`s to `INSERT IGNORE`s. – Ross Smith II Oct 12 '12 at 21:26
1

This is how you can delete duplicate rows... I'll write you my example and you'll need to apply to your code. I have Actors table with ID and I want to delete the rows with repeated first_name

mysql> select actor_id, first_name from actor_2;
+----------+-------------+
| actor_id | first_name  |
+----------+-------------+
|        1 | PENELOPE    |
|        2 | NICK        |
|        3 | ED          |
....
|      199 | JULIA       |
|      200 | THORA       |
+----------+-------------+

200 rows in set (0.00 sec)

-Now I use a Variable called @a to get the ID if the next row have the same first_name(repeated, null if it's not).

mysql> select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name;
+---------------+----------------+
|  first_names  | @a:=first_name |
+---------------+----------------+
|          NULL | ADAM           |
|            71 | ADAM           |
|          NULL | AL             |
|          NULL | ALAN           |
|          NULL | ALBERT         |
|           125 | ALBERT         |
|          NULL | ALEC           |
|          NULL | ANGELA         |
|           144 | ANGELA         |
...
|          NULL | WILL           |
|          NULL | WILLIAM        |
|          NULL | WOODY          |
|            28 | WOODY          |
|          NULL | ZERO           |
+---------------+----------------+
200 rows in set (0.00 sec)

-Now we can get only duplicates ID:

mysql> select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1;
+-------------+
| first_names |
+-------------+
|        NULL |
|          71 |
|        NULL |
 ...
|          28 |
|        NULL |
+-------------+
200 rows in set (0.00 sec)

-the Final Step, Lets DELETE!

mysql> delete from actor_2 where actor_id in (select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1);
Query OK, 72 rows affected (0.01 sec)

-Now lets check our table:

mysql> select count(*) from actor_2 group by first_name;
+----------+
| count(*) |
+----------+
|        1 |
|        1 |
|        1 |
...
|        1 |
+----------+
128 rows in set (0.00 sec)

it works, if you have any question write me back

jcho360
  • 3,724
  • 1
  • 15
  • 24
1

There can be duplicates for both columns - but there CANNOT be the same row twice

That's a constraint on the table that you have not implemented. The constraint is a unique index on (a,b). If you had the index you would not have duplicates.

IMHO your best approach is to add the unique index to the table, using a temporary table to first remove the duplicates:

  1. Copy person to person_temp
  2. Delete all from person
  3. Add unique index to person
  4. Copy unique a,b from person_temp to `person.
Miserable Variable
  • 28,432
  • 15
  • 72
  • 133