MySQL blocks queries for over an hour

Question

We have a MySQL (5.7.19-0ubuntu0.16.04.1-log) instance hosted on Digital Ocean. Specifically the data folder is in a DO Volume (aka Block Storage).

A few days ago, we've noticed that MySQL has frozen up for about 1 hour. During this time, our web server basically froze up as well and all requests timed out.

Upon investigating our logs, the only clue we got was:

2017-09-30T13:40:11.518265Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 3718180ms. The settings might not be optimal. (flushed=199 and evicted=0, during the time.)

As I understand from this answer, this means that it was flushing 199 pages, each of which was 16KB, which means this was only ~3MB of data to be flushed. This should not have needed an hour to complete. Unfortunately, we were not around during the downtime and we didn't get a chance to run SHOW ENGINE INNODB STATUS

We suspect it to be a performance issue with DigitalOcean's block storage vs using the machine's own SSD, but DO support suggested that this shouldn't have been an issue.

At this point, we're totally lost on what could be the cause of the issue. We're not even sure if page_cleaner is the cause, or if it's also a victim of a deeper problem.

I'd like to understand why this happened and what we can do to prevent it in the future. Any pointers or ways to debug this issue would be greatly appreciated, thanks!

========== More Info ==========

As we continue to investigate, it seems like problems similar to this has been happening, but we've almost never seen it due to the issues only taking for a few seconds:

2017-09-29T07:34:43.346581Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 11628ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2017-09-29T13:15:04.859060Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7326ms. The settings might not be optimal. (flushed=111 and evicted=0, during the time.)
2017-09-29T22:45:14.299615Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5621ms. The settings might not be optimal. (flushed=396 and evicted=0, during the time.)
2017-09-30T02:02:42.000685Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4119ms. The settings might not be optimal. (flushed=10 and evicted=0, during the time.)
2017-09-30T02:02:54.424966Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4704ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2017-09-30T03:36:01.500662Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7096ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2017-09-30T03:36:07.747733Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4100ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)

We've since tweaked innodb_lru_scan_depth to 256 as suggested by this, but we have not seen any improvements.

*"DO support suggested that this shouldn't have been an issue."* Well, perhaps it *shouldn't*, but that isn't the same as it *wasn't*. I only recalling ever seeing this behavior on systems with dramatically overloaded or failing disk hardware. Watching the output of `iostat -x 1` might be helpful. — Michael - sqlbot, Oct 03 '17 at 19:23
Please provide `SELECT * FROM information_schema.INNODB_METRICS WHERE name LIKE 'buffer_%' ;` -- I'll see if I can get some clues from there. — Rick James, Oct 07 '17 at 01:34
anyone ever answer this? I have similar issue. 201 pages flushed took 5 minutes — Garr Godfrey, Dec 01 '18 at 01:53
@GarrGodfrey - Start a new question with _your_ specifics; the differences might lead to an answer. — Rick James, Apr 16 '19 at 21:28

MySQL blocks queries for over an hour

0 Answers0