0

I'm trying to remove duplicates from a MySQL table where two column values will be the same.

In this case, I want to, say, have an id column (called nid), and a hash column with the same values:

| nid |    hash    |
|  2  |   932298   |
|  2  |   932298   |

I'd like only one of them to survive, preferably the first one inserted in the database.

I'm looking at this post but my use case is slightly different:

MySQL remove duplicates from big database quick

I'm also open to other options

Community
  • 1
  • 1
Steven Matthews
  • 9,705
  • 45
  • 126
  • 232

2 Answers2

1
ALTER IGNORE TABLE `table_name` ADD UNIQUE (`hash`)
Steven Matthews
  • 9,705
  • 45
  • 126
  • 232
Jerko W. Tisler
  • 996
  • 9
  • 29
  • Wouldn't hash be the primary key? There can be multiple nids, but I only want one hash value per nid (and theoretically, it should be a unique hash id /period/) – Steven Matthews Feb 26 '14 at 14:59
  • You have to explain me this problem a little bit, do you need unique pairs or unique hashes? – Jerko W. Tisler Feb 26 '14 at 15:00
  • Ok, so there will be multiple nids with the exact same value - there could be 30 different rows with an nid of 2, for example. The hash is calculated based on the properties of an object related to the database table, and should theoretically be unique - if there are two hashes that are exactly the same, that means that the two objects/rows are the same, and I only want one of them (preferably the first one inserted into the database) to remain. Is that clearer? – Steven Matthews Feb 26 '14 at 15:03
  • can you please confirm this? You can have pairs (nid, hash) = (1,1), (1,2), (2,1), (2,2)? is that true? – Jerko W. Tisler Feb 26 '14 at 15:04
  • You could theoretically have (1, 1), (1, 2), (2, 1), (2, 2), but the way the hash is calculated, it would be almost impossible to have the same hash for rows with a different nid, so it is best to assume that the only time hash will be the same is when it has the same nid. And all rows where there are duplicate hashes can be discarded. – Steven Matthews Feb 26 '14 at 15:06
  • With your edited query: Specified key was too long; max key length is 767 bytes – Steven Matthews Feb 26 '14 at 15:07
  • to avoid duplicate answers read http://stackoverflow.com/questions/19940642/mysqlerror-specified-key-was-too-long-max-key-length-is-767-bytes-create-in – Jerko W. Tisler Feb 26 '14 at 15:09
  • I modified my varchar length to 255 (to solve the prior problem), and now it is giving the error, "multiple primary key defined" - but the IGNORE should take care of that, shouldn't it? I think we're on the right path though. – Steven Matthews Feb 26 '14 at 15:17
  • The problem is that the primary key isn't to be trusted for this particular operation, otherwise this would be trivial. – Steven Matthews Feb 26 '14 at 15:23
  • Yes that's why i wrote ALTER IGNORE, if you are using MySQL and you are, it should remove duplicates – Jerko W. Tisler Feb 26 '14 at 15:23
  • Hrm, then why would it give the error about multiple primary keys being defined when ignore was given too? Would ADD UNIQUE work instead of ADD PRIMARY? – Steven Matthews Feb 26 '14 at 15:24
  • You can use ADD UNIQUE(`hash`) instead of ADD PRIMARY KEY(`hash`) – Jerko W. Tisler Feb 26 '14 at 15:26
  • Ok, this seems to work. You get the answer. Is there a way to ensure it gets the first item inserted into the database? – Steven Matthews Feb 26 '14 at 15:29
0

simplest way of achieving it I believe:

1) create table `table_copy`(id int primary key,hash varchar(255), unique(`hash`)) select distinct id,hash from `table_name`;
2) drop table `table_name`;
3) rename table `table_copy` to `table_name`;
avisheks
  • 1,178
  • 10
  • 27
  • You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''table_copy'(nid,hash) as select distinct nid,hash from node_revision' at line 1 – Steven Matthews Feb 26 '14 at 14:50
  • @AndrewAlexander, edits being made try now..syntax were not exact but anyway you get the idea yeah? – avisheks Feb 26 '14 at 17:26