The message "too long wal write" means that too much time has elapsed between writing updates to the .xlog file ("too much" here meaning "more than specified in Tarantool's configuration parameter too_long_threshold").
There are two common reasons: 1) slow disk 2) problems on the application's side.
To figure out the reason nature, launch atop
with a 1s interval and check out what happened during the "too long" events: disk util
means disk issues; cpu util
means application issues.
The recommended solution for slow disk issues is to write changes to the write ahead log in batches, where every batch is wrapped in a single transaction. This will give you just one disk write per transaction. You'll need no yields in this case (see notes about fiber.yield
further on).
Typical application issues are as follows:
you launched too many fibers (so, due to successive fiber switch, too
much time may elapse before the next WAL write);
you make no yields within time-consuming operations (like making full
scan search, deleting a huge number of records, etc).
Notes on yields:
- You need to make explicit yields using fiber.yield().
- You don't need to move time-consuming operations to a dedicated
fiber; you can as well launch them within the main loop, say
require('fiber')
and occasionally yield control within your program
cycle (not too often though, several times per the interval specified
in too_long_threshold
is quite enough).
As you optimize your application code, remember that one Tarantool instance can utilize only one CPU core, so increasing the number of CPU cores is useless — the only solution is to ensure proper control yields among the fibers.