14

Could you please explain why is Cassandra not linearizable even when quorum based read and writes are used?

Linearizability defined as

If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state.

Anurag Sharma
  • 2,409
  • 2
  • 16
  • 34

2 Answers2

13

Edit considering Cassandra foreground Read Repair:

Writes that fail because only a partial set of replicas are updated could lead to two different readers seeing two different values of data. This is because of the lack of rollbacks in simple quorum-based consistency approaches. This behavior breaks the linearizability guarantees for single-key reads. As described in this discussion, a distributed consensus protocol such as Raft or Paxos is a must-have for such a guarantee.

enter image description here

Also, other phenomena such as clock drift and leap second can break the Cassandra session consistency.

Earlier Answer (without considering Cassandra foreground read repair):

Summary: In Cassandra write may not feel atomic. Some nodes get writes faster than others thus even if we rely on quorum the result depends on the set of nodes that return values and what values they hold at that point.

Also, to explain definition of linearizability adding to definition in bold

If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state (but never old state again) .

Copying from Martin Klepmann's Data Intensive Applications book

Linearizability and quorums Intuitively, it seems as though strict quorum reads and writes should be linearizable in a Dynamo-style model. However, when we have variable network delays, it is possible to have race conditions, as demonstrated in Figure 9-6.

enter image description here

In Figure 9-6, the initial value of x is 0, and a writer client is updating x to 1 by sending the write to all three replicas (n = 3, w = 3). Concurrently, client A reads from a quorum of two nodes (r = 2) and sees the new value 1 on one of the nodes. Also concurrently with the write, client B reads from a different quorum of two nodes, and gets back the old value 0 from both.

The quorum condition is met (w + r > n), but this execution is nevertheless not linearizable: B’s request begins after A’s request completes, but B returns the old value while A returns the new value. (It’s once again the Alice and Bob situation from Figure 9-1.)

Interestingly, it is possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair (see “Read repair and antientropy” on page 178) synchronously, before returning results to the application [23], and a writer must read the latest state of a quorum of nodes before sending its writes [24, 25]. However, Riak does not perform synchronous read repair due to the performance penalty [26]. Cassandra does wait for read repair to complete on quorum reads [27], but it loses linearizability if there are multiple concurrent writes to the same key, due to its use of last-write-wins conflict resolution.

Moreover, only linearizable read and write operations can be implemented in this way; a linearizable compare-and-set operation cannot, because it requires a consensus algorithm [28].

In summary, it is safest to assume that a leaderless system with Dynamo-style replication does not provide linearizability.

And a bit more explaination about Linearizability vs Serializability:

enter image description here

Anurag Sharma
  • 2,409
  • 2
  • 16
  • 34
  • 1
    Excellent find, sold me on buying the book, thanks! The diagram with a demonstration of multiple clients (1 writer 2 readers) is a nice touch. – Andy Tolbert Jun 28 '19 at 21:52
  • 1
    Can you mark this as the correct answer? I feel it is a much more concise explanation and mine doesn't quite answer the question correctly. – Andy Tolbert Jun 28 '19 at 21:55
  • 1
    Please correct me if I’m wrong, but the scenario from the figure doesn’t prove that Cassandra is not linearizable, because Cassandra does read repair in quorum reads and would have fix the inconsistency in Replica 2 when Reader A reads x. – eric tan Apr 30 '21 at 08:54
  • @erictan Thanks for pointing the read repair case. I have made an edit to include that scenario. – Anurag Sharma May 02 '21 at 09:04
  • In the "inconsistent reads after write failure" example wouldn't the first reader (the one who sees the updated node) perform foreground read repair? If the repair is performed, the second reader would see the updated value (provided 2nd read starts after the 1st one is finished) – Nikita Tkachenko Feb 01 '22 at 18:54
  • also, could you please elaborate on "Cassandra ... loses linearizability if there are multiple concurrent writes to the same key, due to its use of last-write-wins conflict resolution"? – Nikita Tkachenko Feb 01 '22 at 18:55
3

EDIT: In hindsight this isn't the best explanation. I recommend reading Anurag's answer below which is much more concise.

Since normal Cassandra operations don't observe existing state that it is changing, quorum consistency alone is not considered 'Linearizable'.

For example, if you were to to adjust the balance of a bank account, you would need to know the current balance in order to adjust it. Consider a client that executes the following operations:

A. SELECT balance FROM account WHERE id='x' (assume this returns 5.12)
B. UPDATE account SET balance=4.12 WHERE id='x' (subtract 1$ from balance)

The problem is, from the perspective of Cassandra, the operation B doesn't effectively 'see' A since it's not considering the existing state of the data or any other operations that may be occurring for that matter. Another client could be updating balance for the same account during the submission of B.

Lightweight Transactions in Cassandra 2.0 describes how lightweight transactions provide 'linearizable consistency' by providing constructs that ensure that operations are performed in sequence for a given partition and are not interrupted by others. So instead of my previous example, you can now do:

A. SELECT balance FROM account WHERE id='x' (assume this returns 5.12)
B. UPDATE account SET balance=4.12 WHERE id='x' IF balance=5.12

The use of IF balance=5.12 instructs Cassandra to begin a lightweight transaction, which uses a paxos consesus protocol for leadership election and to ensure operations are applied sequentially. If the state of balance does not meet the condition, the update will not be applied (indicated in a successful response with a was_applied boolean column). If C* is not able to achieve this within some timeout (due to contention or some other factors), the operation will fail, will not be applied, and the client will be surfaced a timeout.

Andy Tolbert
  • 11,418
  • 1
  • 30
  • 45
  • Could you please also explain then what would be the difference between Serialzability and Linearizability in case of C* with some example – Anurag Sharma Jun 28 '19 at 05:22
  • 1
    Peter Bailis' article on Linearizability and Serializability is a good reference on the two: http://www.bailis.org/blog/linearizability-versus-serializability/ . In the context of cassandra, serializability cannot be achieved as it has no concept of transactions involving multiple objects. LWT provides linearizable consistency over a single one object (row). It's a bit confusing that there is a consistency level called 'SERIAL' that applies to reads and LWTs as it can cause some to conflate that with serializability. – Andy Tolbert Jun 28 '19 at 14:48