Strong Consistency in Cassandra

Question

According to datastax article, strong consistency can be guaranteed if, R + W > N where R is the consistency level of read operations W is the consistency level of write operations N is the number of replicas

What does strong consistency mean here? Does it mean that 'every time' a query's response is given from the database, the response will 'always' be the last updated value? If conditions of strong consistency is maintained in cassandra, then, are there no scenarios where the data returned might be inconsistent? In short, does strong consistency mean 100% consistency?

Edit 1

Adding some additional material regarding some scenarios where Cassandra might not be consistent even when R+W>RF

score 4 · Answer 1 · answered Jan 08 '18 at 13:23

Cassandra has tunable consistency with some tradeoffs you can choose.

R + W > N - this simply means there must be one overlapping node in your roundtrip that has the actual and newest data available to be consistent.

For example if you write at CL.ONE you will need to read at CL.ALL to be sure to get a consistent result: N+1 > N - but you might not want CL.ALL as you can not tolerate a single node failure in your cluster.

Often you can choose CL.QUORUM at read and write time to ensure consistency and tolerate node failures. For example at RF=3 a QUORUM needs (3/2)+1=2 nodes available, so R+W>N will be 4>3 - your requests are consistent AND you can tolerate a single node failure.

One thing to keep in mind - it is really important to have thight synchronized clocks on all your nodes (cassandra and application), you will want to have ntp up and running.

@DarkSkull RF == Replication Factor – Aaron Oct 01 '21 at 22:24 — Aaron, Oct 01 '21 at 22:24

score 2 · Answer 2 · answered Jan 08 '18 at 13:19

2

For both reads and writes, the consistency levels of ANY , ONE , TWO , and THREE are considered weak, whereas QUORUM and ALL are considered strong.

answered Jan 08 '18 at 13:19

Horia

2,942
7
14

Prakhar Agrawal · Answer 3 · 2019-06-06T11:29:59.237

2

While this is an old question, I thought I would chip in to set the record straight.

R+W>RF does not imply strong consistency

A system with **R+W>RF* will only be eventually consistent. The claims for strong consistency guarentee break during node failures or in between writes. For example consider the following scenario:

Assume that there are 3 nodes A,B,C with RF=3, W=3, R=2 (hence, R+W = 5 > 3 = RF)

Further assume key k is associated to value v i.e. (k,v) is stored on the database. Suppose the following series of actions occur:

t=1: (k,v1) write request is sent to A,B,C from a user
t=2: (k,v1) reaches A and is written to store at A
t=3: Reader 1 sends a read request for key k, which is replied to by A and B
t=4: Reader 1 receives response (k,v1) - by latest write wins rule
t=5: Reader 1 sends another read request which gets served by nodes B and C
t=6: Reader 1 receives response (k,v), which is an older value INCONSISTENCY
t=7: (k,v1) reaches C and is written to store at C
t=8: (k,v1) reaches B and is written to store at B

This demonstrates that W+R>RF cannot guarantee strong consistency. To ensure strong consistency you might want to use another algorithm such as paxos or raft that can help in ensuring that the writes are atomic. You can read an interesting article on the same here (Do checkout the FAQ section)

Edit:

Cassandra does have some internal mechanism (called the blocking read repairs) - that trigger synchronous writes before response from the db is sent back to client. This kind of synchronous read repair occurs in case of inconsistencies amongst the nodes queried to achieve read consistency level and ensures something known as Monotonic Read Consistency [See below for definitions]. This causes the (k,v1) in above example to be written to node B before response is returned in case of first read request and so the second read request would also have an updated value. (Thanks to @Nadav Har'El for pointing this out)

However, this still does not guarantee strong consistency. Below are some definitions to clear it of:

Sequential/Strong Consistency: the result of any execution is the same as if the reads and writes occur in some order, and the operations of each individual processor appear in this sequence in the order specified by its program [as defined by Leslie Lamport]

Monotonic Read Consistency: once you read a value, all subsequent reads will return this value or a newer version

Sequential consistency would require the client program/reader to see the latest value that was written since the write statement is executed before the read statement in the sequence of program instructions.

edited Jun 06 '19 at 11:29

answered Jun 05 '19 at 16:43

Prakhar Agrawal

1,002
12
21

2

You're wrong. The "consistency" you are expecting is called "monotonic read" (i.e., never read an older value after already having read a new one) and Cassandra actually does support it! In step t=3, when the coordinator receives from A and B and sees they are different, it reconciles (to generate the result to be returned in t=4) but ALSO sends the reconciled result to both A and B, synchronously (not returning before the read finished). So when the read finished, returning (k,v1), this data is on BOTH A and B. The next read, no matter which two nodes it will pick, will also return (k,v1). – Nadav Har'El Jun 05 '19 at 21:18
@NadavHar'El "The protocol is said to support strong consistency if: All accesses are seen by all parallel processes (or nodes, processors, etc.) in the same order (sequentially)" - from wikipedia. A strongly (sequentially) consistent system, therefore, has to have the monotonic read consistency. Should MR be violated, there is no point in speaking of the strong consistency. – Prakhar Agrawal Jun 06 '19 at 02:26
But as I explained, Cassandra *does* do monotonic read repair, thanks to its synchronous reconciliation on read. At least inside is one DC. – Nadav Har'El Jun 06 '19 at 05:50
@NadavHar'El Was reading about Cassandra on the link: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlClientRequestsRead.html . This seems to mention that the read repair is asynchronous and runs in the background as opposed to blocking the read request. This would imply that the example above would still stand since the second request of the reader might reach replicas B&C much before this read repair occurs. – Prakhar Agrawal Jun 06 '19 at 06:59
2

Cassandra has two different features. One is probabilistic read repair, which runs in the background sometimes, after having achieved CL. The other one is reconciliation. It happens when a coordinator receives different data from different replicas before achieving CL. Then, Cassandra writes the reconciled data back - and *waits* for the write to complete. There are delicate issues regarding multi-DC (Cassandra doesn't wait for writes on other DC) and CL (which CL should the write have?), but they didn't simply forget to handle this issue. https://issues.apache.org/jira/browse/CASSANDRA-2494 – Nadav Har'El Jun 06 '19 at 07:20
1

@NadavHar'El Thanks. Having read CL as Quorum would imply requiring write CL of quorum to achieve consistency in such a case, isn't it? Also, if the read repairs across DCs are not synchronous, would this mean that the consistency guarantee is broken here? – Prakhar Agrawal Jun 06 '19 at 07:41
1

Although I'm to some degree a Cassandra expert (having been involved in rewriting it to C++, namely ScyllaDB), I don't remember every detail and you may want to ask bigger experts, or inspect the code. If I remember correctly, reconciliation waits for LOCAL_QUORUM writes to succeeded regardless of the read CL, and this is good enough for QUORUM writes. And yes, read monotonicity *is* broken across DCs if I remember correctly. It doesn't have to be this way, it was just a shortcut taken to avoid very high latencies... – Nadav Har'El Jun 06 '19 at 12:54
What will happen if at t=3, the read query is serviced by B and C? – Vishal Sharma Jun 08 '19 at 05:44
@VishalSharma Since both B&C have (k,v), the co-ordinator sees no inconsistency in satisfying the read CL of 2 and replies with (k,v). My example was just a disproof by counterexample to show R+W>=RF will not guarantee consistency. – Prakhar Agrawal Jun 08 '19 at 16:18
R = All, W = One will ensure strong consistency, right? – quangh Jul 12 '20 at 03:42
Cassandra write is atomicity – Junbang Huang Sep 02 '20 at 17:50
This answer still didn't explain why it is not a strong consistency if we have read repair mechanism – Stan Sep 17 '22 at 03:30

score 1 · Answer 4 · answered Jan 08 '18 at 13:17

1

Yes. If R + W consistency is greater than replicas then you will always get consistent data. 100% consistency. But you will have to trade availability to achieve higher consistency.

Cassandra has concept of tunable consistency (set consistency on query basis).

answered Jan 08 '18 at 13:17

undefined_variable

6,180
2
22
37

1

https://stackoverflow.com/questions/30935174/what-will-happen-if-write-failed-in-cassandra-cluster-when-using-quorum-cl The accepted answer differs that strong consistency can not be provided even if R+W>N, is some kind of tuning necessary apart from R+W>N condition to achieve strong consistency? – Farsan Rashid Jul 17 '18 at 07:18
I found one more article, which highlights a scenario in which inconsistent data will be returned even when R+W>RF(the second answer in the link given by @FarsanRashid also highlights one of the scenarios) https://blog.scottlogic.com/2017/10/06/cassandra-eventual-consistency.html – Vishal Sharma Feb 28 '19 at 06:26
R + W dosen't ensure strong consistency – manpreet singh Jul 06 '20 at 02:39
1

@manpreetsingh https://cassandra.apache.org/doc/latest/architecture/dynamo.html#picking-consistency-levels – undefined_variable Jul 06 '20 at 13:06

Stan · Answer 5 · 2022-09-18T23:38:14.763

I will actually regard this Strong Consistency as Strong read consistency. And it is sessional, aka Monotonic Read Consistency.( refer to @NadavHar'El answer).

But it is not sequential consistency as Cassandra doesn't fully support lock, transaction or serialze the write operation. There is only lightweight transaction, which supports local serialization of write operation and serialization of read operation.

To make things easy to understand. Let's say we have three nodes - A, B, C and set read quorum to be 3 and write to be 1.

If there is only one client, it writes to any node - A.
B and C might be not synchronized.(Eventually they will -- Eventual consistency)

But When the client reads again, it requires client to get at least three nodes' response and by comparing the latest timestamp, we will use A's record. This is Monotonic Read Consistency

However,if there are two client trying to update the records at the same time or if they try to read the value first and then rewrite it(e.g increase column by 100) at the same time: Client C1 and Client C2 both read the current column value as 10, and they both decide to increase it by 100: While C1 just need to write 110 to one node, client C2 will do the same and the final result on any node can only be 110 max.

Then we lose 100 in these operations(Lost updates). It is the issue caused by race condition or concurrent issues. It has to be fixed by serializing the operation and using any form of lock just like how other SQL DB implements transaction.

I know Cassandra now has new counter column which might solve it but it is still limited in terms of the full transaction. And Cassandra is also not supposed to be transactional as it is NoSQL database which sacrifice consistency for availability

Strong Consistency in Cassandra

5 Answers5

Linked