2

I just found out that a table on my production server (which holds approx. 35K records) contains 588 duplicate entries in an INT(11) column which has AUTO_INCREMENT. The UNIQUE key is missing on that column so that's probably the cause.

Any ideas on how to give all duplicate entries a unique ID and then adding the UNIQUE key to the column so this will never happen again?

Table schema:

CREATE TABLE `items` (
 `item_ID` int(11) unsigned NOT NULL auto_increment,
 `u_ID` int(10) NOT NULL default '0',
 `user_ID` int(11) NOT NULL default '0',
 `p_ID` tinyint(4) NOT NULL default '0',
 `url` varchar(255) NOT NULL,
 `used` int(10) unsigned NOT NULL,
 `sort` tinyint(4) NOT NULL,
 `last_checked` int(11) NOT NULL,
 `unixtime` int(11) NOT NULL,
 `switched` int(11) NOT NULL,
 `active` tinyint(1) NOT NULL default '0',
 UNIQUE KEY `unique` (`p_ID`,`url`),
 KEY `index` (`u_ID`,`item_ID`,`sort`,`active`),
 KEY `index2` (`u_ID`,`switched`,`active`),
 KEY `item_ID` (`item_ID`),
 KEY `p_ID` (`p_ID`),
 KEY `u_ID` (`u_ID`)
) ENGINE=MyISAM AUTO_INCREMENT=42755 DEFAULT CHARSET=utf8
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
Thomas G
  • 100
  • 9

3 Answers3

2

How about something like this? Again test it on a backup first.

# Copy duplicate records
CREATE TABLE newitem SELECT * FROM items WHERE item_ID IN 
    (SELECT item_ID FROM itemd GROUP BY item_ID HAVING COUNT(*) > 1);

# remove auto increment from id in new table
ALTER TABLE newitem DROP INDEX Item_ID, MODIFY item_ID int;

# delete duplicates from original
DELETE FROM item WHERE item_ID IN (SELECT DISTINCT item_ID FROM newitem);

#Update column to be primary key
ALTER TABLE items DROP INDEX Item_ID, ADD PRIMARY KEY (Item_ID);

# Set new duplicate ID's to null
UPDATE newitem SET item_ID=NULL;

# Insert records back into old table
INSERT INTO item SELECT * FROM newitem;

# Get rid of work table
DROP newitem;
Adrian Cornish
  • 23,227
  • 13
  • 61
  • 77
2

Since you already have a UNIQUE key on the table, you could use this to make an UPDATE statement that re-assigns unique ids to item_id:

UPDATE
        items AS it
    JOIN
        ( SELECT 
              i.p_ID, i.url, @id:= @id+1 AS id
          FROM 
                  items AS i  
              CROSS JOIN 
                  ( SELECT @id:=0 ) AS dummy
          ORDER BY
              i.p_ID, i.url
        ) AS unq
      ON 
      (unq.p_ID, unq.url) = (it.p_ID, it.url)
SET 
    it.item_id = unq.id ;

Then you can add a unique index on item_id

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
1

Interesting. You have an auto_increment without a Primary Key reference, just an index, this is why you have dupes in the first place. If you try to update and assign primary key (item_ID) MySQL will complain because of the dupes in the item_ID column.

Your engine is MyISAM which means you don't have any FK constraints, so you could do a mysqldump of the table, truncate the table, update the schema, then re-import the data. Upon re-import MySQL should correctly insert all rows with truly unique Item_Ids.

I'll outline the steps here, but I strongly suggest you do this in a dev environment to confirm the steps work correctly, before applying to your production environment. I accept no responsibility for borked production data :)

$ mysqldump -u <user_name> -h <db_host> --opt <database_name> --single-transaction > backup.sql

mysql> truncate table `items`;

mysql> ALTER TABLE `items` DROP INDEX `Item_ID`, ADD PRIMARY KEY (`item_ID`), AUTO_INCREMENT = 1;

$ vi backup.sql # Remove the AUTO_INCREMENT reference from the Create Table syntax

$ mysql -h <host_name> <db_name> -u <username> -p < backup.sql    

Give that a shot, these steps are un-tested but should set you down the right path.

Mike Purcell
  • 19,847
  • 10
  • 52
  • 89
  • this will regenerate the item_ID's right? I want to keep my current ID's and only update the records with a new ID that have duplicate ID's. I could use a while loop in PHP to check whether a record is a duplicate one and give it a new ID with `MAX(item_ID)` but it sounds risky – Thomas G Dec 12 '11 at 22:42
  • Yes it will regenerate them. If that is the case then you can write an app to loop through the ids, identify the dupes, then assign the dupes new values. Then when the new values have been assigned, run the alter statement to indicate that `Item_ID` is a `primary key` and not just an index. – Mike Purcell Dec 12 '11 at 22:55