0

The table I have is

id    user_id    date        created_at
1     123     2020-02-02  2020-02-02 10:00:00
2     123     2020-02-02  2020-02-02 10:00:01
3     789     2020-02-12  2020-02-12 12:00:00
4     456     2020-02-10  2020-02-10 10:00:00
5     456     2020-02-10  2020-02-10 10:00:01

I want to delete duplicate entries and I want the desired output -

id    user_id    date        created_at
1     123     2020-02-02  2020-02-02 10:00:00
3     789     2020-02-12  2020-02-12 12:00:00
4     456     2020-02-10  2020-02-10 10:00:00

I tried the following query -

DELETE
    `a`
FROM
    `table1` AS `a`,
    `table1` AS `b`
WHERE
    `a`.`id` < `b`.`id` AND `a`.`user_id` <=> `b`.`user_id`

But it's taking too long and the error I get is

Lock wait timeout exceeded; try restarting transaction

The table I have has more than 9500000 entries.

What could be a better alternative query?

rik_maz
  • 69
  • 5

2 Answers2

1

You have lots of records hence it is giving timeout error. Perform this operation in smaller chunk. Try below query

DELETE
        `a`
    FROM
        `table1` AS `a` inner join
        `table1` AS `b`
    WHERE
        `a`.`id` < `b`.`id` AND `a`.`user_id` == `b`.`user_id` and `a`.`id` >***10000(Increase this)***

Hope this will helpful for you.

chaitanya dalvi
  • 1,539
  • 2
  • 17
  • 25
0

Try to use correlated subquery:

DELETE t1
FROM table1 t1
WHERE EXISTS ( SELECT NULL
               FROM table1 t2
               WHERE t1.user_id = t2.user_id
                 AND t1.id > t2.id )

The index (user_id, id) will increase the query speed.

Akina
  • 39,301
  • 5
  • 14
  • 25