0

Notes:

  • The return from SELECT version() is 10.5.12-MariaDB-log
  • Default collation: utf8mb4_unicode-ci
  • Default charset: utf8mb4
  • Queries run using MySQL Workbench for Ubuntu Linux 8.0.29

My goal is to delete duplicated items in a table. I do not have any other table to use as a reference to check duplicates. I created a simple query and subquery that returns expected results:


    SELECT * FROM messages WHERE id NOT IN
        (SELECT id FROM
            messages
        WHERE
            uid = '11899414026778263'
        GROUP BY message_id , uid
        ORDER BY created_at);

Despite setting SQL_SAFE_UPDATES to 0, a DELETE operation using the same data fails. I get a "QUERY Interrupted" message.

    SET SQL_SAFE_UPDATES = 0;
    DELETE FROM messages WHERE id NOT IN
        (SELECT id FROM
            messages
        WHERE
            uid = '11899414026778263'
        GROUP BY message_id , uid
        ORDER BY created_at);
    SET SQL_SAFE_UPDATES = 1;

If I replace DELETE with SELECT *, the query returns results. Another StackOverflow answer said that querying based on a sub-query does not work in MySQL. Others say to use another table as reference instead of a subquery. DELETE query results in 'Query Interrupted' MySQL Workbench?

This method works in some SQL implementations based on these answers and websites:

Joachim Rives
  • 471
  • 6
  • 19
  • your subquery doesn't seem appropriate to find duplicate rows. can you explain how you mean for it to work? – ysth Jun 29 '22 at 21:01
  • @ysth There are rows with repeated combinations of `message_id` and `uid`. I want to delete those rows. There is a separate primary key for this table called `id`. – Joachim Rives Jun 29 '22 at 21:03
  • 1
    your inner query will (if you have ONLY_FULL_GROUP_BY off) find an arbitrary id for each message_id for the specified uid. then the outer query deletes all other ids for that uid and *all* ids for any other uid. surely you don't want that. – ysth Jun 29 '22 at 21:05
  • this kind of query indeed does not work in mysql, but that is fixed in mariadb. using a join could still be better; queries with large IN sets are not optimized as well in all cases in my experience – ysth Jun 29 '22 at 21:27

1 Answers1

0

order by is applied after grouping, so your order by is not sufficient to select the id with the lowest created_at for each group. Your query will fail under ONLY_FULL_GROUP_BY because the id returned by the subselect will be arbitrary within each group. You want to use first_value instead.

But it's easier just to not use a subquery:

delete m
from messages m
# is there a message we would prefer over this one?
inner join messages m2 on (m2.uid,m2.message_id)=(m.uid,m.message_id) and (m2.created_at,m2.id) < (m.created_at,m.id)
ysth
  • 96,171
  • 6
  • 121
  • 214
  • Hello. I cannot test this answer anymore since I just restored the database to a month-old version instead. It is missing some data but still works. If anyone finds this useful, please upvote this answer. If I happen to see many up-votes, I will accept this answer. – Joachim Rives Jul 04 '22 at 10:33