72

I find myself wanting to delete rows older than (x)-days on a rolling basis in a lot of applications. What is the best way to do this most efficiently on a high-traffic table?

For instance, if I have a table that stores notifications and I only want to keep these for 7 days. Or high scores that I only want to keep for 31 days.

Right now I keep a row storing the epoch time posted and run a cron job that runs once per hour and deletes them in increments like this:

DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100

I do that until mysql_affected_rows returns 0.

I used to do it all at once but that caused everything in the application to hang for 30 seconds or so while INSERTS piled up. Adding the LIMIT worked to alleviate this but I'm wondering if there is a better way to do this.

Brad Dwyer
  • 6,305
  • 8
  • 48
  • 68
  • 1
    Did you consider running the cron job during off-hours (12am-2am), or you still serving heavy traffic at that time? – Mike Purcell Feb 27 '12 at 21:01
  • 1
    +1 @MikePurcell I usually schedule cron jobs when the activity is at its lowest. – Mārtiņš Briedis Feb 27 '12 at 21:09
  • Do you have an index set on the `time_stored` column? Not having one might be responsible for your slow `DELETE` query. – Thorsten Feb 27 '12 at 21:10
  • Mike - we have lots of users in Australia as well so while we do have off-peak hours they're still very busy. – Brad Dwyer Feb 27 '12 at 21:11
  • Thorsten - yep, I think the performance issue comes from the number of rows we're deleting (tens of thousands per hour). – Brad Dwyer Feb 27 '12 at 21:12
  • 3
    If you are deleting rows: (too) many indexes on various columns in the table also causes slow deletion, because these indexes have to be updated as well. – The Nail Feb 27 '12 at 21:25
  • 3
    Are you re-optimizing your tables after the deletes? They might be super-fragmented. – SenorAmor Feb 27 '12 at 21:32

6 Answers6

85

Try creating Event that will run on database automatically after the time interval you want.

Here is an Example: If you want to delete entries that are more than 30 days old from some table 'tableName', having column entry 'datetime'. Then following query runs every day which will do required clean-up action.

CREATE EVENT AutoDeleteOldNotifications
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 1 DAY 
ON COMPLETION PRESERVE
DO 
DELETE LOW_PRIORITY FROM databaseName.tableName WHERE datetime < DATE_SUB(NOW(), INTERVAL 30 DAY)

We need to add ON COMPLETION PRESERVE to keep the event after each run. You can find more info here: http://www.mysqltutorial.org/mysql-triggers/working-mysql-scheduled-event/

scytale
  • 12,346
  • 3
  • 32
  • 46
Akshay
  • 1,831
  • 1
  • 18
  • 22
39

Check out MySQL Partitioning:

Data that loses its usefulness can often be easily removed from a partitioned table by dropping the partition (or partitions) containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding one or more new partitions for storing specifically that data.

See e.g. this section to get some ideas on how to apply it:

MySQL Partition Pruning

And this one:

Partitioning by dates: the quick how-to

The Nail
  • 8,355
  • 2
  • 35
  • 48
  • 1
    Awesome, this looks promising. I'll have to do some benchmarking on this! – Brad Dwyer Feb 27 '12 at 21:24
  • Read through the articles you posted and this looks like exactly what I was looking for. Thanks! (accepted answer) – Brad Dwyer Feb 27 '12 at 21:30
  • 5
    Just wanted to update that I have this working now and it's great! Especially because Partition Pruning means the partition that I'm dropping isn't even being accessed at all once it's out of date so long as I put a bound on the SELECTS. This resource helped a lot: http://www.slideshare.net/datacharmer/mysql-partitions-tutorial – Brad Dwyer Feb 28 '12 at 01:59
  • @BradDwyer: Nice! Always good to hear when a plan comes together. Thanks for reporting back. – Mike Purcell Feb 28 '12 at 05:02
  • 1
    The second link seems to be gone. I found this other one that might be the same: https://dev.mysql.com/doc/internals/en/optimizer-partitioning-related.html#optimizer-partition-pruning-overview – solarc Nov 02 '18 at 23:29
  • This solution has the disadvantage that old records are being kept in the db wasting memory for no reason – Sebi2020 Feb 09 '21 at 12:17
  • @Sebi2020 if too many records are kept, maybe this can be solved by making the partitions smaller? According to https://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations.html#partitioning-limitations-max-partitions there can be up to 8192 partitions – The Nail Feb 15 '21 at 10:52
  • @TheNail I'm not familiar with partitions, but I think it would be good to combine this approach with something that would delete all rows in the "old rows" partition. I'm not sure if MySQL supports something like TRUNCATE PARTITION. – Sebi2020 Feb 18 '21 at 14:40
3

Instead of executing the delete against the table alone, try gathering the matching keys first and then do a DELETE JOIN

Given you sample query above

DELETE FROM my_table WHERE time_stored < 1234567890 LIMIT 100 ;

You can leave the LIMIT out of it.

Let say you want to delete data that over 31 days old.

Let's compute 31 days in seconds (86400 X 31 = 2678400)

  • Start with key gathering
  • Next, index the keys
  • Then, perform DELETE JOIN
  • Finally, drop the gathered keys

Here is the algorithm

CREATE TABLE delete_keys SELECT id FROM my_table WHERE 1=2;
INSERT INTO delete_keys
SELECT id FROM
(
    SELECT id FROM my_table
    WHERE time_stored < (UNIX_TIMESTAMP() - 2678400)
    ORDER BY time_stored
) A LIMIT 100;
ALTER TABLE delete_keys ADD PRIMARY KEY (id);
DELETE B.* FROM delete_keys
INNER JOIN my_table B USING (id);
DROP TABLE delete_keys;

If the key gathering is less than 5 minutes, then run this query every 5 minutes.

Give it a Try !!!

UPDATE 2012-02-27 16:55 EDT

Here is something that should speed up key gathering a little more. Add the following index:

ALTER TABLE my_table ADD INDEX time_stored_id_ndx (time_stored,id);

This will better support the subquery that populates the delete_keys table because this provides a covering index so that the fields are retrieved frok the index only.

UPDATE 2012-02-27 16:59 EDT

Since you have to delete often, you may want to try this every two months

OPTIMIZE TABLE my_table;

This will defrag the table after all those annoying little deletes every 5 minutes for two months

Community
  • 1
  • 1
RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
1

At my company, we have a similar situation. We have a table that contains keys that have an expiration. We have a cron that runs to clean that out:

DELETE FROM t1 WHERE expiration < UNIXTIME(NOW());

This ran once an hour, but we were having similar issues to what you are experiencing. We increased it to once per minute. Then 6 times per minute. Setup a cron with a bash script that basically does the query, then sleeps for a few seconds and repeats until the minute is up.

The increased frequency significantly decreased the number of rows that we were deleting. Which relieved the contention. This is the route that I would go.

However, if you find that you still have too many rows to delete, use the limit and do a sleep between them. For example, if you have 50k rows to delete, do a 10k chunk with a 2 second sleep between them. This will help the queries from stacking up, and it will allow the server to perform some normal operations between these bulk deletes.

Jericon
  • 4,992
  • 3
  • 20
  • 22
  • Yep, that's the solution we have landed on so far. Hoping for something a bit more scalable. It seems like there should be a better way to do this especially since all the data to be deleted should be grouped together on disk since it was all inserted in sequence. – Brad Dwyer Feb 27 '12 at 21:19
1

You may want to consider introducing a master/slave (replication) solution into your design. If you shift all the read traffic to the slave, you open up the master to handle 'on-the-fly' CRUD activities, which then replicate down to the slave (your read server).

And because you are deleting so many records you may want to consider running an optimize on the table(s) from where the rows are being deleted.

Mike Purcell
  • 19,847
  • 10
  • 52
  • 89
0

Ended up using this to leave only 100 last rows in place, so significant lag when executed frequently (every minute)

delete a from tbl a left join (
    select ID
    from tbl
    order by id desc limit 100
) b on a.ID = b.ID
where b.ID is null;
Robert Sinclair
  • 4,550
  • 2
  • 44
  • 46