I've received a reasonably big MySQL database with a lot of duplicates. Around a third of the database are duplicates. It is also missing a primary key.
The structure of the database is so:
unique_id | field01 | field02 | field03 | field04 | ...... | field26 | field27 |
Now while the unique_id is supposed to be, well, unique; there are a lot of duplicates.
Also, for a set of rows belonging to copies of a particular unique_id, the remaining columns may or may not be the same. For example, looking at unique_id 'id_1' below, field01 is same in both the instances, but field02 & field03 are not.
I would like to remove the repetitions and have only one copy of each unique_id survive. It does not matter which one survives.
Example:
id_1 | abc | dfd | NULL | ... | def |
id_2 | abc | daf | ghi | ... | 12a |
id_1 | abc | xyz | jkl | ... | def |
id_4 | aaa | bbb | NULL | ... | def |
id_3 | NULL | bbb | NULL | ... | 123 |
id_5 | 1e3 | NULL | NULL | ... | def |
id_3 | aaa | bbb | fds | ... | def |
id_9 | awa | bbb | NULL | ... | 910 |
Should become:
id_1 | abc | dfd | NULL | ... | def |
id_2 | abc | daf | ghi | ... | 12a |
id_4 | aaa | bbb | NULL | ... | def |
id_3 | NULL | bbb | NULL | ... | 123 |
id_5 | 1e3 | NULL | NULL | ... | def |
id_9 | awa | bbb | NULL | ... | 910 |
Or this is also fine:
id_2 | abc | daf | ghi | ... | 12a |
id_1 | abc | xyz | jkl | ... | def |
id_4 | aaa | bbb | NULL | ... | def |
id_5 | 1e3 | NULL | NULL | ... | def |
id_3 | aaa | bbb | fds | ... | def |
id_9 | awa | bbb | NULL | ... | 910 |
Once this is done, I would need to set unique_is as the primary key.
Please advice the best and accurate way to do this. Thanks in advance.