2

I have the table having duplicate entries against column "image_url" now i want to remove only duplicates and keep just one record of each?

here is my query i tried :

DELETE t1
FROM
    ad_images_copy t1
JOIN ad_images_copy t2 ON t2.image_url = t1.image_url
AND t2.id < t1.id

reference link for remove duplicate rows but it gave me the error

Error : Lock wait timeout exceeded; try restarting transaction

now i want to Limit this query because i have to much records.

i tried by adding limit in last but it not work for me .

DELETE t1
    FROM
        ad_images_copy t1
    JOIN ad_images_copy t2 ON t2.image_url = t1.image_url
    AND t2.id < t1.id
limit 100 ;

help if any other way to do this.

Community
  • 1
  • 1
Haseeb
  • 2,214
  • 1
  • 22
  • 43

1 Answers1

1

One thing to check is that image_url and id are indexed on both tables. (Probably id is the primary key, so it'll be indexed, but you should check that image_url is indexed.) If not, add indexes.

If so, you might try avoiding the locking by doing it in two steps. Are your rows immutable? If they are, then you can use a SELECT query to find all the rows you want to delete, and then use some batched DELETEs to get rid of them. This means that you won't end up locking the whole table.

It's safe to do it this way as long as you know that the rows you're looking at aren't going to change, or get removed from t2, between the SELECT and the DELETE.

chiastic-security
  • 20,430
  • 4
  • 39
  • 67
  • no image_url is not indexed and currently it has duplicate entries , now how can i add index on it – Haseeb Feb 10 '16 at 13:09
  • Ah, you need to index it on both tables! `ALTER TABLE \`t1\` ADD INDEX \`idx_image_url\` (\`image_url\`)` should do it for `t1`, and similarly for `t2`. It'll take some time. – chiastic-security Feb 10 '16 at 13:11
  • i tried this multiple time but it give me the error LOST CONNECTION TO MYSQL SERVER DURING QURING .. – Haseeb Feb 10 '16 at 13:25
  • Well, it's the right query. You might need to experiment with your timeout settings in whatever client you're using. Or add the indexes from the command line. – chiastic-security Feb 10 '16 at 13:55
  • yes i tried it with command line , and query was in progress from last 30 minutes :( – Haseeb Feb 10 '16 at 14:00
  • Yes, it might take a very long time (even overnight), depending on how many rows you have. – chiastic-security Feb 10 '16 at 14:02