5

I've received a reasonably big MySQL database with a lot of duplicates. Around a third of the database are duplicates. It is also missing a primary key.

The structure of the database is so:

unique_id | field01 | field02 | field03 | field04 | ...... | field26 | field27 |

Now while the unique_id is supposed to be, well, unique; there are a lot of duplicates.

Also, for a set of rows belonging to copies of a particular unique_id, the remaining columns may or may not be the same. For example, looking at unique_id 'id_1' below, field01 is same in both the instances, but field02 & field03 are not.

I would like to remove the repetitions and have only one copy of each unique_id survive. It does not matter which one survives.

Example:

id_1 | abc  | dfd  | NULL | ... | def | 
id_2 | abc  | daf  | ghi  | ... | 12a |
id_1 | abc  | xyz  | jkl  | ... | def |
id_4 | aaa  | bbb  | NULL | ... | def |
id_3 | NULL | bbb  | NULL | ... | 123 |
id_5 | 1e3  | NULL | NULL | ... | def |
id_3 | aaa  | bbb  | fds  | ... | def |
id_9 | awa  | bbb  | NULL | ... | 910 |

Should become:

id_1 | abc  | dfd  | NULL | ... | def | 
id_2 | abc  | daf  | ghi  | ... | 12a |
id_4 | aaa  | bbb  | NULL | ... | def |
id_3 | NULL | bbb  | NULL | ... | 123 |
id_5 | 1e3  | NULL | NULL | ... | def |
id_9 | awa  | bbb  | NULL | ... | 910 |

Or this is also fine:

id_2 | abc  | daf  | ghi  | ... | 12a |
id_1 | abc  | xyz  | jkl  | ... | def |
id_4 | aaa  | bbb  | NULL | ... | def |
id_5 | 1e3  | NULL | NULL | ... | def |
id_3 | aaa  | bbb  | fds  | ... | def |
id_9 | awa  | bbb  | NULL | ... | 910 |

Once this is done, I would need to set unique_is as the primary key.

Please advice the best and accurate way to do this. Thanks in advance.

3 Answers3

2

You could add a unique index in your table, ignoring all errors, and let MySql delete all duplicated rows (except one) for you:

ALTER IGNORE TABLE your_table ADD UNIQUE INDEX dupidx (unique_id)

but if you still get an error, even if you use IGNORE clause, try this:

ALTER TABLE your_table ENGINE MyISAM;
ALTER TABLE IGNORE your_table ADD UNIQUE INDEX dupidx (unique_id);
ALTER TABLE your_table ENGINE InnoDB;

If you are just interested in showing your data, keeping just one row for each duplicated id, you could use this:

SELECT *
FROM your_table
GROUP BY unique_id
fthiella
  • 48,073
  • 15
  • 90
  • 106
1

I had that problem, and this worked great for me:

CREATE TABLE newtable SELECT * FROM oldtable GROUP BY unique_id;

Then drop the oldtable, and rename the new table to the old table name

Brett Stubbs
  • 669
  • 1
  • 9
  • 19
0

try this :

select unique_id, field01 , field02 , field03 from (table_name) group by unique_id having sum(unique_id)<1
Ankit Bajpai
  • 13,128
  • 4
  • 25
  • 40
dtnder
  • 381
  • 2
  • 10
  • 25