0

I am trying to remove duplicate rows from my database so for that I am using this query

DELETE  FROM data
WHERE data.ID NOT IN (
                     SELECT * FROM ( 
                                    SELECT MIN(ID)  FROM data GROUP BY Link
                                   ) AS p 
                      ) 

It is working fine but the problem is my database has over 1 Million rows so when I use this it takes the hell of time like after 4 to 5 hours it was still at loading.. and then I just closed the tab. So Please if someone has a better query tell me. Thanks in Advace

Table Structure http://s29.postimg.org/bt57k5enb/image.jpg

Brett Schneider
  • 3,993
  • 2
  • 16
  • 33
  • 1
    What is the structure of your table? What determines if a row is a duplicate? – Cully May 08 '14 at 07:56
  • Is this just a one-time thing you want to do? If not, why wouldn't you just prevent duplicates from getting into your database in the first place? – Cully May 08 '14 at 07:57
  • 2
    are you really using `SQL-Server` AND `MySQL`? – DrCopyPaste May 08 '14 at 07:57
  • I am using MySQL on phpmyadmin on server Table name is data Columns : ID, Link, Title, Word, Size – user3532237 May 08 '14 at 08:01
  • So, "Link" is column what you want make unique? – avisheks May 08 '14 at 08:03
  • why you do 3 nested SELECT? two should be enough: `DELETE FROM data WHERE data.ID NOT IN (SELECT MIN(ID) FROM data GROUP BY Link)`. This probabily won't solve your issue but maybe your query will be a little faster – arilia May 08 '14 at 08:15
  • it gives me this error that's why I am using that code `You can't specify target table 'data' for update in FROM clause ` @arilia – user3532237 May 08 '14 at 08:21

1 Answers1

2

One solution could be:


1) Create a temp table
2) Store single record for each Link column
3) Truncate "data" table
4) Alter the "data" table(add UNIQUE KEY CONSTRAINT)
5) Reimport data table back from temp table and delete tmp table

1&2) CREATE TABLE tmp AS SELECT * FROM data GROUP BY Link;
3) TRUNCATE TABLE data; -- disable foreign key constraints if any 
4) ALTER TABLE data ADD UNIQUE KEY data_link_unique(Link);
5) INSERT INTO data SELECT * FROM tmp;
avisheks
  • 1,178
  • 10
  • 27