0

Each week a cronjob fires that deletes about 500,000 records from a MySQL table which uses the InnoDB engine. The table contains web log data for weekly processing and afterwards data which is no longer needed is deleted. There are 3 indexes on this table (no foreign keys), including time which is a unix timestamp.

DELETE FROM global_viewlog WHERE time<1354391592

The problem is that this query when ran last week took over 2 hours to run and during that time my entire server was lagging. 'iotop' revealed the hard disks where being written to by MySQL quote vastly and the server load dramatically increased to unhealthy levels.

The table gets written to every click. Is there an elephant in the room that I'm not seeing? I.e., something blindly obvious to optimize this weekly query?

Thanks

Onst
  • 15
  • 1
  • 3
  • What is your log file size? If its small, then you'll get buffered writing of the logs to the disk which may explain the hard disk thrashing. – Burhan Khalid Dec 01 '12 at 20:05
  • Are you using transactions to do this or just naive? What indices do you have on the data? – hd1 Dec 01 '12 at 20:58
  • Possible duplicate of [What is the best way to delete old rows from MySQL on a rolling basis?](http://stackoverflow.com/questions/9472167/what-is-the-best-way-to-delete-old-rows-from-mysql-on-a-rolling-basis) – Amir Ali Akbari Dec 13 '16 at 14:23

2 Answers2

5

If you regularly delete old data based on time, partitioning the table based on the same column will make it much faster. It will separate different time periods to different areas on disk so a delete will just be to drop an entire chunk instead of looking at individual rows.

http://dev.mysql.com/doc/refman/5.5/en/partitioning.html

Erik Ekman
  • 2,051
  • 12
  • 13
2

You're using InnoDB so there's an implicit transaction. You're deleting a whole mess of records, so the transaction is big. @Erik Ekman is right, if you can get partitioning to work it's a good way to handle this.

But here's another good way. Try doing your delete in smaller batches, like so.

  DELETE 
    FROM global_viewlog 
   WHERE time<135439159
   LIMIT 1000

Keep issuing this statement until nothing is left to delete. Most client libraries (JDBC, etc) return the number of records affected. Or you can just issue the statement a thousand times from a script if you want to do this the cheap way.

(You can fiddle with the number of records in the LIMIT statement.) It may or may not take more elapsed time than your single delete statement, but it won't tie up your server as much.

Try changing this table's access method to MyISAM and using this:

  DELETE LOW_PRIORITY QUICK
    FROM global_viewlog 
   WHERE time<135439159

This will keep your cleanup operation from interfering with production.

Change your cronjob's frequency from once a week to once a day. This will reduce the size of your delete batches so you don't have such a whopping mess to clean up.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thanks for your suggestion Ollie. Changing the table to MyISAM would require least effort on my part, but how would this scale? I.e. if I get 10 million pageviews per week, will the MyISAM table be able to delete with a low priority without causing hassle to the rest of the server? Thanks. – Onst Dec 07 '12 at 07:21
  • In addition to your solution here, would an INSERT DELAYED query be good for the inserts that are going on at the same time as the DELETE LOW_PRIORITY? Thanks. – Onst Dec 07 '12 at 08:13
  • Yes, INSERT DELAYED will work if you switch to MyISAM. But the problem is the huge InnoDB delete transaction. – O. Jones Dec 07 '12 at 19:38
  • This was the solution. I Changed to MyISAM, used INSERT DELAYED and then DELETE LOW PRIORITY which now causes no issues. Thanks! – Onst Mar 04 '13 at 19:24