Cassandra is configured to lose 10 seconds of data by default?

Question

As the data in the Commitlog is flushed to the disk periodically after every 10 seconds by default (controlled by commitlog_sync_period_in_ms), so if all replicas crash within 10 seconds, will I lose all that data? Does it mean that, theoretically, a Cassandra Cluster can lose data?

Data is not sent to the memtable first! First it's appended to the commitlog and then it's stored in the memtable and then the ack is sent. Check the insert trace: https://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 — Vishal Sharma, Apr 12 '18 at 07:41

score 10 · Accepted Answer · edited Sep 10 '22 at 09:53

If a node crashed right before updating the commit log on disk, then yes, you could lose up to ten seconds of data.

If you keep multiple replicas, by using a replication factor higher than 1 or have multiple data centers, then much of the lost data would be on other nodes, and would be recovered on the crashed node when it was repaired.

Also the commit log may be written in less than ten seconds it the write volume is high enough to hit size limits before the ten seconds.

If you want more durability than this (at the cost of higher latency), then you can change the commitlog_sync setting from periodic to batch. In batch mode it uses the commitlog_sync_batch_window_in_ms setting to control how often batches of writes are written to disk. In batch mode the writes are not acked until written to disk.

The ten second default for periodic mode is designed for spinning disks, since they are so slow there is a performance hit if you block acks waiting for commit log writes. For this reason if you use batch mode, they recommend a dedicated disk for the commit log so that the write head doesn't need to do any seeks to keep the added latency as low as possible.

If you are using SSDs, then you can use more aggressive timing since the latency is greatly reduced compared to a spinning disk.

As far as i understand commit log is already on the disk, so even if a node crashes in under 10 secs and the restarts, shouldn't it replay everything from commit log and recover the data ? — Vinay, Dec 16 '16 at 06:21
@Vinay The data gets written to the disk only after every 10 seconds. Therefore, "you can potentially lose up to that much data if all replicas crash within that window of time." Please check out: https://wiki.apache.org/cassandra/Durability — Vishal Sharma, Apr 12 '18 at 07:50
"written to disk" should be "fsynced" everywhere. Commit log writes happen before the mutation is completed, but the commitlog itself is not fsynced on every write. The data loss is only expected if the crash is a machine level crash, not a process level one. — Nitsan Wakart, Aug 20 '18 at 08:19
Looks like http://www.mongodb-is-web-scale.com/ applies to Cassandra too — csauve, Oct 22 '18 at 21:39

score 4 · Answer 2 · edited Sep 10 '22 at 09:50

4

Cassandra's default configuration sets the commitlog_sync mode to periodic, causing the commit log to be synced every commitlog_sync_period_in_ms milliseconds, so you can potentially lose up to that much data if all replicas crash within that window of time.

edited Sep 10 '22 at 09:50

mirekphd

4,799
3
38
59

answered Jun 24 '15 at 17:49

EnjoyTheVibez

128
6

Cassandra is configured to lose 10 seconds of data by default?

2 Answers2

Linked