Can I/O latency cause a simple UPDATE to take seconds in MySQL?

Question

My MySQL application is experiencing slow performance when running some UPDATE, INSERT and DELETE queries. In this question, I will only discuss one particular UPDATE, because it's enough to demonstrate the problem:

UPDATE projects SET ring = 5 WHERE id = 1

This UPDATE is usually fast enough, around 0.2ms, but every now and then (enough to be a problem) it takes several seconds. Here's an excerpt from the log (look at the 4th line):

 ~ (0.000282) UPDATE `projects` SET `ring` = 5 WHERE `id` = 1
 ~ (0.000214) UPDATE `projects` SET `ring` = 6 WHERE `id` = 1
 ~ (0.000238) UPDATE `projects` SET `ring` = 7 WHERE `id` = 1
 ~ (3.986502) UPDATE `projects` SET `ring` = 8 WHERE `id` = 1
 ~ (0.000186) UPDATE `projects` SET `ring` = 9 WHERE `id` = 1
 ~ (0.000217) UPDATE `projects` SET `ring` = 0 WHERE `id` = 1
 ~ (0.000162) UPDATE `projects` SET `ring` = 1 WHERE `id` = 1

projects is an InnoDB table with 6 columns of types INT and VARCHAR, 17 rows and an index on id. It happens with other tables too, but here I'm focusing on this one. When trying to solve the problem, I ensured that the queries were all sequential, so this is not a lock issue. The UPDATE above is executed in the context of a transaction. Other information on the server:

VPS with 4GB RAM (was 1GB), 12GB free disk space
CentoOS 5.8 (was 5.7)
MySQL 5.5.10 (was 5.0.x)

The "was" bit above means it didn't work before or after the upgrade.

What I've tried so far, to no avail:

Setting innodb_flush_log_at_trx_commit to 0, 1 or 2
Setting innodb_locks_unsafe_for_binlog on or off
Setting timed_mutexes on or off
Changing innodb_flush_method from the default to O_DSYNC or O_DIRECT
Increasing innodb_buffer_pool_size from the default to 600M and then to 3000M
Increasing innodb_log_file_size from the default to 128M
Compiling MySQL from source
Running SHOW PROCESSLIST, which informs me that the state is "updating"
Running SHOW PROFILE ALL, which says that almost all the time was spent on "updating", and that, within that step, not so much time was spent on CPU cycles and there were many voluntary context switches (like 30)
Monitoring SHOW STATUS for changes in Innodb_buffer_pool_pages_dirty. There may be some relation between dirty pages being flushed and the slow queries, but the correlation isn't clear.

Then I decided to check the system's I/O latency with ioping. This is my first VPS, so I was surprised to see this result:

4096 bytes from . (vzfs /dev/vzfs): request=1 time=249.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=2 time=12.3 ms
4096 bytes from . (vzfs /dev/vzfs): request=3 time=110.5 ms
4096 bytes from . (vzfs /dev/vzfs): request=4 time=232.8 ms
4096 bytes from . (vzfs /dev/vzfs): request=5 time=294.4 ms
4096 bytes from . (vzfs /dev/vzfs): request=6 time=704.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=7 time=1115.0 ms
4096 bytes from . (vzfs /dev/vzfs): request=8 time=209.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=9 time=64.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=10 time=396.2 ms

Pretty erratic, I would say.

Having said all of that, I ask:

Can the I/O latency be occasionally killing MySQL performance? I always thought that, when you ran an UPDATE, the thread taking care of that connection wasn't going to flush data to disk or wait for such a flush; it would return immediately and the flushing would be done by another thread at another time.
If it can't be disk I/O, is there anything else I can possibly try, short of renting a dedicated server?

Is it the COMMIT that takes 5 seconds, or the actual UPDATE? — Joachim Isaksson, Mar 11 '12 at 14:21
`UPDATE` should normally - just as you're saying - not flush to disk, only `COMMIT` would flush the log while running in your thread (and innodb_flush_log_at_trx_commit=0 should turn even that off) Your I/O latency is definitely out of control(!), but unless the load is high enough to not allow caching of the 17 row table for read, I can't see how it would be the cause of this exact problem. — Joachim Isaksson, Mar 11 '12 at 14:34
Exactly: even this out-of-control latency shouldn't cause such a delay! The load on my VPS is only me using it. Along with this UPDATE, some SELECTs are performed on a table with more than 1 million rows, and those are always fast. — Adiel Mittmann, Mar 11 '12 at 14:47
It's possible that your final suggestion (re: dedicated server) hit the nail on the head. Short of using a dedicated server, are you able to move your VPS to a different physical host? — aingram, Mar 11 '12 at 20:54
I don't know if my provider, myhosting.com, would move my VPS around their servers. I'm trying to avoid the dedicated server option because they're much more expensive (this is for a small, personal project). I could switch to another provider, but -- and this is what *really* bothers me -- I haven't been able to determine the cause of the problem, so I don't know if moving to another provider would fix it. — Adiel Mittmann, Mar 12 '12 at 02:21

score 2 · Accepted Answer · answered Mar 14 '12 at 03:44

I'm replying to my own question with additional data that I collected based on your answers.

I used two notebooks connected by means of a wireless network. On notebook A, I mounted a directory of notebook B using sshfs. Then on notebook A I started up MySQL specifying that mounted directory as its data directory. This should provide MySQL with a very slow I/O device. MySQL was started with innodb_flush_log_at_trx_commit = 0.

I defined 3 sets of queries, each set consisting of an update and a select query repeated 10,000 times, without explicit transactions. The experiments were:

US1SID: update and select on a specific row of the same table. The same row was used in all iterations.
US1MID: update and select on a specific row of the same table. The row was a different one in each iteration.
US2MID: update and select on rows of different tables. In this case, the table being read by the select didn't change at all during the experiment.

Each set was run twice using a shell script (hence timings are slower than those in my original question), one under normal conditions and the other after executing the following command:

tc qdisc replace dev wlan0 root handle 1:0 netem delay 200ms

The command above adds a mean delay of 200ms when transmitting packets through wlan0.

First, here's the mean time of the top 99% fastest updates and selects, and the bottom 1% updates and selects.

          |        Delay: 0ms        |       Delay: 200ms       |
          | US1SID | US1MID | US2MID | US1SID | US1MID | US2MID |
| top99%u | 0.0064 | 0.0064 | 0.0064 | 0.0063 | 0.0063 | 0.0063 |
| top99%s | 0.0062 | 0.0063 | 0.0063 | 0.0062 | 0.0062 | 0.0062 |
| bot01%u | 1.1834 | 1.2239 | 0.9561 | 1.9461 | 1.7492 | 1.9731 |
| bot01%s | 0.4600 | 0.5391 | 0.3417 | 1.4424 | 1.1557 | 1.6426 |

As is clear, even with really, really poor I/O performance, MySQL manages to execute most queries really fast. But what concerns me the most is the worst cases, so here's another table, showing the 10 slowest queries. A "u" means it was an update, an "s" a select.

|          Delay: 0ms         |          Delay: 200ms          |
| US1SID  | US1MID  | US2MID  | US1SID   | US1MID   | US2MID   |
| 5.443 u | 5.946 u | 5.315 u | 11.500 u | 10.860 u | 11.424 s |
| 5.581 u | 5.954 s | 5.466 u | 11.649 s | 10.995 u | 11.496 s |
| 5.863 s | 6.291 u | 5.658 u | 12.551 s | 11.020 u | 12.221 s |
| 6.192 u | 6.513 u | 5.685 u | 12.893 s | 11.370 s | 12.599 u |
| 6.560 u | 6.521 u | 5.736 u | 13.526 u | 11.387 u | 12.803 u |
| 6.562 u | 6.555 u | 5.743 u | 13.997 s | 11.497 u | 12.920 u |
| 6.872 u | 6.575 u | 5.869 u | 14.662 u | 12.825 u | 13.625 u |
| 6.887 u | 7.908 u | 5.996 u | 19.953 u | 12.860 u | 13.828 s |
| 6.937 u | 8.100 u | 6.330 u | 20.623 u | 14.015 u | 16.292 u |
| 8.665 u | 8.298 u | 6.893 u | 27.102 u | 22.042 s | 17.131 u |

Conclusions:

Poor I/O performance can indeed slow MySQL to a crawl. It's not clear why or when exactly, but it does happen.
The slowing down applies to both selects and updates, with updates suffering more.
For some reason, even selects on a table that wasn't involved in any changes, and which had recently been populated, were also slowed down, as is clear from US2MID above.
As for the test cases proposed by mentatkgs, it seems that updating different rows instead of the same ones does help a little, but doesn't solve the problem.

I guess I will either adapt my software to tolerate such delays or try to move to another provider. Renting a dedicated server is too expensive for this project.

Thank you all for the comments.

score 1 · Answer 2 · answered Mar 13 '12 at 06:38

As you are hosting your VPS in the cloud, you might be running into issues that are completely out of your control.

VPSs are subject to the whims of the host servers that run them. For example, CPU cycle priority at the Rackspace Cloud is weighted based on the size of the VPS. The bigger your VPS, the better likelihood that your app will perform smoothly. If there is a bigger VPS on the host you're using, it's possible that there is weighted bursting to blame. It's pretty hard to say.

Have you tried running this locally on your own machine? If it runs perfectly on your own system, and you need guaranteed performance, then your best bet will be to move to a dedicated server.

On other machines it runs fine. Performance is important for my application, but since renting a dedicated server would be much more expensive, at least for the time being I will try to adapt the software to these delays. In a few minutes I'll post some other results I obtained that helped clarify the issue. — Adiel Mittmann, Mar 14 '12 at 03:32

score 1 · Answer 3 · answered Mar 13 '12 at 17:53

1

You have a VPS-related IO problem. It's not MySQL's fault.

Are you by any chance using Elastic Block Store with Amazon, or possibly RDS? Both of those use remote storage, and an IP protocol layer to talk to the storage; they can have nasty lag at times.

answered Mar 13 '12 at 17:53

Peter V

613
4
11

I'm using Myhosting.com. I never thought this was, e.g., a bug in MySQL. I just don't quite understand its behavior. (Basically, if a table is in memory, why does MySQL block when updating it?) – Adiel Mittmann Mar 14 '12 at 03:36
A MySQL table is never guaranteed 'in memory', although MySQL will work hard to keep indices in memory. If you need a true in-memory db, then depending on your requirements, you could check out sqlite (not good for concurrency, but fast and easy for many things), a NoSQL solution like redis, or some of the more modern column store databases which offer a bit more control here. – Peter V Mar 15 '12 at 16:34

score 0 · Answer 4 · answered Mar 12 '12 at 21:15

0

Question 1) Yes.

To check it write 2 apps:

Test case 1: will do this every minuto for a few hours

UPDATE `projects` SET `ring` = 5 WHERE `id` = 1
UPDATE `projects` SET `ring` = 6 WHERE `id` = 1

Test case 2: will do this every minute for a few hours

UPDATE `projects` SET `ring` = 7 WHERE `id` = 1
UPDATE `projects` SET `ring` = 8 WHERE `id` = 2

Test case 1 should have a delay, while test case 2 should not.

Question 2) Use a noSQL database.

answered Mar 12 '12 at 21:15

mentatkgs

1,541
1
13
17

In a few minutes I'm going to post a reply answering my question. I ran some experiments the results of which partly support your arguments. Using NoSQL is out of the question though :P – Adiel Mittmann Mar 14 '12 at 03:37

Can I/O latency cause a simple UPDATE to take seconds in MySQL?

4 Answers4