0

I need to delete a lot of rows in multiple tables that are older than certain date. Actually I'm doing this:

DELETE FROM trash1 WHERE deletion_date < DATE_SUB(NOW(), INTERVAL 48 HOUR);
DELETE FROM trash2 WHERE deletion_date < DATE_SUB(NOW(), INTERVAL 48 HOUR);
DELETE FROM trash3 WHERE deletion_date < DATE_SUB(NOW(), INTERVAL 48 HOUR);

Is there any approach to make it better or faster? Is it possible to do it in only one query? Thanks.

DoNotArrestMe
  • 1,285
  • 1
  • 9
  • 20
Formiga
  • 113
  • 1
  • 12
  • I trust your tables are not really called `trash1`, `trash2` and `trash3`? – Schwern Feb 02 '14 at 20:35
  • Approximately how many rows per table are being deleted? Do you already have an index on deletion_date? What storage engine are these tables using? – John McMahon Feb 02 '14 at 20:35
  • Possible duplicate: http://stackoverflow.com/questions/1318972/deleting-millions-of-rows-in-mysql – John McMahon Feb 02 '14 at 20:42
  • @JohnMcMahon, the number varies a lot, because it is a trash of files, and other things related to a cloud storage. I already have an index on deletion_date and the storage engine is InnoDB, actually using MariaDB, check my update on question. – Formiga Feb 03 '14 at 02:22
  • @Schwern, does it really matter? The name of the tables are really different, of course, but it is related to 3 trashes of different kind of data I'm processing, but all three shares the deletion_date. – Formiga Feb 03 '14 at 02:36
  • @Formiga Just checking there wasn't a second problem (naming conventions) hidden in your first. I had a thought, since you're doing the same operation simultaneously to three tables, one option is to consider if you need three tables or if you can do it with one. Another thought, have you actually measured these queries are slow? – Schwern Feb 06 '14 at 21:16
  • Possible duplicate: https://stackoverflow.com/questions/3331992/how-to-delete-from-multiple-tables-in-mysql – Schwern Feb 06 '14 at 21:22
  • @Schwern, I really need the three tables, because they are very different. The only thing they share is the deletion_date. I didn't tested it in large scale, I will need to do that on beta tests. But if you think this way is fine, I trust in you. – Formiga Feb 07 '14 at 21:42

1 Answers1

1

First thing I would check is the efficiency of the query WHERE deletion_date < DATE_SUB(NOW(), INTERVAL 48 HOUR) on those tables using EXPLAIN.

EXPLAIN DELETE FROM trash1 WHERE deletion_date < DATE_SUB(NOW(), INTERVAL 48 HOUR);

And so on. You're looking to see if the query is using an indexed column or not and how many rows it has to search through. You want to avoid searching every row in the table.

Next thing to check is whether deletion_date is indexed and the type of index. Because this is a range query (ie. "less than") rather than a simple equality check the type of index can matter. There's a discussion about optimizing range queries in the MySQL manual.

Start with that, see how it goes. If you have trouble, post the output of your EXPLAINs.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Thanks for the reply. I can't see other way to do that. I followed the instructions here: http://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html. deletion_date is indexed and it's type is mysql timestamp. – Formiga Feb 03 '14 at 02:35
  • 1
    @Formiga Have you measured the performance of your queries? Or are they performing ok and you're only concerned there might be a better way? If they're well indexed and operating fast enough, what you're doing is fine. – Schwern Feb 06 '14 at 21:20
  • on my tests, a little limited in quantity of rows, performance is good. I was only concerned there might be a better way. Thanks for your help. I will double check the indexes. – Formiga Feb 07 '14 at 21:43