We have a MySQL (5.7.19-0ubuntu0.16.04.1-log) instance hosted on Digital Ocean. Specifically the data folder is in a DO Volume (aka Block Storage).
A few days ago, we've noticed that MySQL has frozen up for about 1 hour. During this time, our web server basically froze up as well and all requests timed out.
Upon investigating our logs, the only clue we got was:
2017-09-30T13:40:11.518265Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 3718180ms. The settings might not be optimal. (flushed=199 and evicted=0, during the time.)
As I understand from this answer, this means that it was flushing 199 pages, each of which was 16KB, which means this was only ~3MB of data to be flushed. This should not have needed an hour to complete. Unfortunately, we were not around during the downtime and we didn't get a chance to run SHOW ENGINE INNODB STATUS
We suspect it to be a performance issue with DigitalOcean's block storage vs using the machine's own SSD, but DO support suggested that this shouldn't have been an issue.
At this point, we're totally lost on what could be the cause of the issue. We're not even sure if page_cleaner is the cause, or if it's also a victim of a deeper problem.
I'd like to understand why this happened and what we can do to prevent it in the future. Any pointers or ways to debug this issue would be greatly appreciated, thanks!
========== More Info ==========
As we continue to investigate, it seems like problems similar to this has been happening, but we've almost never seen it due to the issues only taking for a few seconds:
2017-09-29T07:34:43.346581Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 11628ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2017-09-29T13:15:04.859060Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7326ms. The settings might not be optimal. (flushed=111 and evicted=0, during the time.)
2017-09-29T22:45:14.299615Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5621ms. The settings might not be optimal. (flushed=396 and evicted=0, during the time.)
2017-09-30T02:02:42.000685Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4119ms. The settings might not be optimal. (flushed=10 and evicted=0, during the time.)
2017-09-30T02:02:54.424966Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4704ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2017-09-30T03:36:01.500662Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 7096ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
2017-09-30T03:36:07.747733Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4100ms. The settings might not be optimal. (flushed=9 and evicted=0, during the time.)
We've since tweaked innodb_lru_scan_depth
to 256 as suggested by this, but we have not seen any improvements.