tarantool long WAL write

Question

Use tarantool, why i take in log this strange messages:

2016-03-24 16:19:58.987 [5803] main/493623/http/XXX.XXX.XXX.XXX:57295 txn.cc:214 W> too long WAL write: 0.527 sec
2016-03-24 16:20:09.841 [5803] main/493714/http/XXX.XXX.XXX.XXX:57346 txn.cc:214 W> too long WAL write: 0.605 sec
2016-03-24 16:20:12.988 [5803] main/493716/http/XXX.XXX.XXX.XXX:57347 txn.cc:214 W> too long WAL write: 1.682 sec
2016-03-24 16:20:15.023 [5803] main/493717/http/XXX.XXX.XXX.XXX:37825 txn.cc:214 W> too long WAL write: 3.373 sec
2016-03-24 16:20:35.145 [5803] main/494145/http/

score 3 · Accepted Answer · edited May 23 '17 at 11:58

3

After direct on-site help and debugging with agent-0007, we have found several issues.

Most of them been related to slow virtual environment (openvz been used), which shows inadequate io timings.

This problem is also related to Tarantool sphia make slow selects?

Additionally there are recommendations regarding slow disks: If it is possible, try to place WAL and Tarantool Snapshots or Sophia storage on separate disks.

snap_dir, wal_dir and sophia_dir options: http://tarantool.org/doc/book/configuration/index.html#basic-parameters

Thanks.

edited May 23 '17 at 11:58

Community

1
1

answered Mar 28 '16 at 10:39

Dmitry S.

101
3

Thx for you replay – agent-0007 Apr 04 '16 at 17:14

score 3 · Answer 2 · answered Jun 13 '19 at 15:06

The message "too long wal write" means that too much time has elapsed between writing updates to the .xlog file ("too much" here meaning "more than specified in Tarantool's configuration parameter too_long_threshold").

There are two common reasons: 1) slow disk 2) problems on the application's side.

To figure out the reason nature, launch atop with a 1s interval and check out what happened during the "too long" events: disk util means disk issues; cpu util means application issues.

The recommended solution for slow disk issues is to write changes to the write ahead log in batches, where every batch is wrapped in a single transaction. This will give you just one disk write per transaction. You'll need no yields in this case (see notes about fiber.yield further on).

Typical application issues are as follows:

you launched too many fibers (so, due to successive fiber switch, too much time may elapse before the next WAL write);
you make no yields within time-consuming operations (like making full scan search, deleting a huge number of records, etc).

Notes on yields:

You need to make explicit yields using fiber.yield().
You don't need to move time-consuming operations to a dedicated fiber; you can as well launch them within the main loop, say require('fiber') and occasionally yield control within your program cycle (not too often though, several times per the interval specified in too_long_threshold is quite enough).

As you optimize your application code, remember that one Tarantool instance can utilize only one CPU core, so increasing the number of CPU cores is useless — the only solution is to ensure proper control yields among the fibers.

tarantool long WAL write

2 Answers2