1

For each user_id (group) in a table I want to delete all records except the records with the highest ID in each group. Something similar to this: Delete all records except the most recent one?

In my case it would look like:

DELETE FROM logins
WHERE user_id IS NOT NULL AND id NOT IN (SELECT MAX(id) FROM logins WHERE user_id IS NOT NULL GROUP BY user_id)

The problem is, the table holds 100+ millon records, so doing this in a single query like that would be impossible performance wise.

What other ways could I do this, e.g. in some batches?

Niels Kristian
  • 8,661
  • 11
  • 59
  • 117

1 Answers1

0

I would be inclined to phrase the delete as:

DELETE FROM logins l
    WHERE l.user_id IS NOT NULL AND
          l.id <> (SELECT MAX(l2.id)
                   FROM logins l2
                   WHERE l2.user_id = l.user_id
                  );

This can easily take advantage of an index on logics(user_id, id). However, if you are deleting a lot of rows, it is often better to truncate the table and recreate:

create table temp_logics as
    select l.*
    from logins
    where l.id = (SELECT MAX(l2.id)
                  FROM logins l2
                  WHERE l2.user_id = l.user_id
                 );

truncate table logins;

insert into logins
    select * from temp_logins;

If you take this approach, be sure to validate temp_logins before truncating the second table. Actually, you do not want to use a temporary table for the intermediate table, because something might happen to the database -- and you would lose the data.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Are you sure the first query is doing the same as mine (preserving the largest ID for each group of user_id)? Also, would that one be suitable for deleting one million records+ at a time? – Niels Kristian Nov 06 '17 at 14:22
  • @NielsKristian . . . The first is deleted all but the largest id for every `user_id`. – Gordon Linoff Nov 06 '17 at 23:22
  • Thanks worked. Deleted a lot of data in batches first. Then I ran the first query, worked fine for +2m rows – Niels Kristian Nov 07 '17 at 09:49