17

How can a distributed system be consistent and available (CA)?

Because I would argue when a network partition occurs, CA cannot be possible in a way where every node of the network, even the partioned nodes that users are connected to, continue to be available and answer with consistent data.

Marcellvs
  • 391
  • 1
  • 3
  • 15
pvjhs
  • 549
  • 1
  • 9
  • 24

3 Answers3

32

It can't.

As often mentioned, the CAP theorem in its original form is a little misleading. It can be restated as

in the presence of the network partition, a distributed system is either available or consistent

so you are right. Generally, systems cannot be classified as CA, CP or AP only, since partition tolerance is a property of the system, which describes what to choose in case of a network partition. So it is possible that a system can behave according to AP sometimes, and CP other times (however it is not common).

Another interesting part is that RDBMS databases are often at the CA side of the triangle. This is only the case in a single node setup. Even with master (write) - slave (read) setup, the system is not CA (or if it is termed "CA" for some reason, and cannot recover from network partitions, then a split-bran scenario may happen, a new master is elected for the partition, and chaos ensues, possibly breaking the consistency of the system).

Useful read: https://codahale.com/you-cant-sacrifice-partition-tolerance/.

David Szalai
  • 2,363
  • 28
  • 47
  • The CAP theorem is about distributed systems, I don't understand why people talk about single node setups having something to do with the CAP theorem. Furthermore, in a distributed system, in case of CP systems, you lose availability, if some wire, connecting 2 parts of your cluster loses electrical power or whatever. When a CA system loses electrical power, it loses the letter A, so it's not really a CA system anymore, is it? – pavel_orekhov Jul 06 '23 at 01:29
  • Yeah I don't think CAP should be used to describe non-distributed systems, can't really see the practical point of it. "When a CA system loses electrical power..." - I think a preassumption of CAP is to have a working system, so if all nodes go down it's not really a distributed system anymore from a philosophical viewpoint I guess, but a bunch of metal and wires :D – David Szalai Jul 11 '23 at 12:28
6

It can, but it won't.

The CAP theorem reasons about guarantees when one or more nodes get isolated from the rest of the cluster. In such cases a node has three options which result in the three known CAP trade-offs: i) it keeps responding to any received requests AP; ii) it no longer responds to received requests until it is again able to reach the others CP; iii) it shuts down before receiving any requests to eliminate the partition along with it CA.

In other words you can achieve CA by having your nodes shutting down instead of tolerating the partition but bear in mind that partitions are likely to keep happening hence this will converge to the scenario in which you have a single node in your cluster and I assume this is the opposite of what you want, i.e. having a cluster with multiple nodes is kind of the whole point.

Therefore in practice you end up choosing between CP and CA. See this answer for more illustrative examples.

João Matos
  • 6,102
  • 5
  • 41
  • 76
  • What a good reasoning! I've never thought of that – Octaviotastico Dec 17 '22 at 19:18
  • The nodes would of course need to have some pre-agreement on who shuts down in the case of partition, since no communication will be possible at that point, and they can't all shut down or you wouldn't have Availability. For example, supposing the nodes have unique numeric IDs established prior to the partition, they could agree: any node which cannot reach all nodes with a greater ID, must shut down. – gerardlouw Feb 12 '23 at 14:43
  • What is the difference between ii and iii? Aren't they indistinguishable by an outside observer? – Flavien May 11 '23 at 20:54
  • One is refusal to respond and the other is not existing at all, they are not necessarily indistinguishable – João Matos May 11 '23 at 21:48
  • Nodes shutting down instead of tolerating partitioning means that there's no availability, so, your system is not CA. It's just C. If you call a system that is unavailable in the event of partitioning a CA system, then we can call any CP system a CAP system, because when there's no partitioning it is available and consistent. Also, CAP theorem is about distributed systems, not single node setups, and the guy who came up with it did so in the context of distributed systems, he's a distributed systems researcher. – pavel_orekhov Jul 06 '23 at 01:34
  • @pavel_orekhov that there's no availability, so, your system is not CA -> "CAP-availability" states that for each request R you get a response R'. So if for ex you have a N-node cluster in which one got partitioned and shuts down, you'll still receive R' from any of the other N-1 nodes, so Availability stands. – João Matos Jul 06 '23 at 08:56
  • Also, CAP theorem is about distributed systems, not single node setups -> that's what I'm trying to say in the second paragraph, shutting down nodes goes against what we try to achieve and that's why **you won't** have CA – João Matos Jul 06 '23 at 08:56
  • I disagree with you. Because you only mention some corner case when 1 node gets removed. Partitioning should be looked at in general case. Furthermore, that 1 node that was removed from the cluster due to partitioning, could be serving a whole city, and the service could be made unavailable to them. – pavel_orekhov Jul 06 '23 at 09:15
  • @pavel_orekhov, If you have only one node available for an entire city then you have a single point of failure from the perspective of those that live there, which for all intents and purposes is the same as having a single-node cluster – João Matos Jul 06 '23 at 09:56
  • The definition of CAP-Availability "for each request R you get a response R'" is applicable no matter how many nodes you remove – João Matos Jul 06 '23 at 09:57
  • The cap theorem is applicable to the cluster formed by all nodes that are reachable by the client. Looking at that set of nodes, one may stop hearing back from the other members and from that moment on it has only 3 choices which lead to the 3 CAP settings: 1) keeps working and responding to requests (AP); 2) keeps working but won't provide an answer before hearing back to the cluster (CP); 3) stops working so that upcoming requests reach its colleague nodes (CA) – João Matos Jul 06 '23 at 10:12
2

Dr. Stonebraker says: The guidance from the CAP theorem is that you must choose either A or C, when a network partition is present. As is obvious in the real world, it is possible to achieve both C and A in this failure mode.

See this for thoughts on why CA can exist:

CA is a specification of the operating range: you specify that the system does not work well under partition or, more precisely, that partitions are outside the operating range of the system.

My background is far from these theoretical considerations and I must say it is highly confusing. I am researching distributed Blockchain systems and I don't see why those "generalized" definitions of C, A, P must always apply. If let's say 5% of nodes fail or are otherwise partitioned, the consensus still functions. If an end user is connected to a partitioned node, the node could let the user know it lost connection. I don't even see how any major Blockchain network is CP without defining conditions such as "if a certain amount of nodes fail or get partitioned, the consensus halts".

Marcellvs
  • 391
  • 1
  • 3
  • 15