0

I'm learning about replication strategies and wondering the pros of cassandra (masterless) over mongo (leader/follower).

From what I've read, they both scale writes the same because writes writes in masterless are sent to all nodes via a quorum, and in leader/follower the master will eventually send writes to all followers

For reads, if you are using a masterless where R + W > N, then it is also strongly consistent like leader/follower where you only read from the leader.

So when would you use leader/follower over masterless? How do they scale differently with reads/writes?

The only 2 differences I could find are:

  1. leader/follower may scale reads better than masterless if you read from followers, but then you sacrifice consistency from the replication lag
  2. Masterless has less downtime since we don't have to worry about electing new master when one fails
J Bailey
  • 13
  • 3
  • Pls Refer - https://stackoverflow.com/questions/48434860/master-less-model-in-cassandra-vs-master-slave-model-in-mongodb – Pankaj Jun 22 '22 at 17:54
  • Curious as to the sources you've been reading. The "Master/Slave" terminology has been deprecated in-favor of "Primary/Secondary" or "Leader/Follower" for a couple of years now. NoSQL tech changes a lot, so it might be a good idea to make sure that you're working with more recent sources. – Aaron Jun 23 '22 at 15:21
  • @Pankaj that post described how cassandra works and its tunable consistency, but I was wondering when to choose leader/follower vs masterless – J Bailey Jun 23 '22 at 19:39

1 Answers1

0

Single leader replication is much simpler for clients to reason about. Clients don't need to deal with read repairs and it is much simple in a single leader replications system to support ACID properties.

Also, with a single leader replication you have more options how to handle data flows - you could go synchronous, semi-sync or async. All of these are really good for correct use cases.

As for consistency guarantees, CAP theorem is applicable to both cases, there is no way around that.

A few comments on statements in your question:

  • "writes in masterless are sent to all nodes via a quorum, and in master/slave the master will eventually send writes to all slaves"

In masterless (dynamo style to be more specific) approach, writes are sent to every node. But the write is confirmed to a client after W number of nodes confirmed the write. With this approach, as soon as client got a write confirmation, it can reason that client will see the data if they have started the read right after. If the write fails - less than W nodes confirmed the write, then it is unknown what will clients see. And there are many edge cases around this.

As for leader/follower - writes are sent to followers eventually only in async replication mode. Your system could use sync mode as well. In this case, a write is confirmed to a client when all replicas got the update. Hence, every next read will see the latest data.

  • "masterless where R + W > N, then it is also strongly consistent "

There are many edge cases around this when writes partially fail. Usually the answer is a read repair - which basically says if one client read a partially saved value (a write confirmed by less than W nodes) - then every next read will see that value as well. But as I said, this specific reasoning is a bit more complicated to reason about.

AndrewR
  • 1,252
  • 8
  • 7
  • Thanks that was very helpful. I'm still a little confused on when I'd choose masterless over leader/follower if leader/follower is more flexible and easier for clients? Are there any good resources that give hard rules on when to choose what? – J Bailey Jun 23 '22 at 19:33
  • I always pick strongly consistent and simple unless I need to go other way because of the problem in hands. Either of these technologies may work. What is the actual problem you are trying to solve? What are use cases? What are data access patterns? What is SLA? What it TPS and size of data exchange packets? And, probably one of more important questions - which technologies/systems do you (or your team) has experience with? For example, github is a huge service and they use mysql with synchronous replication (if I not mistaken) - could they use cassandra? Probably. Sorry for vague answer :) – AndrewR Jun 23 '22 at 19:55
  • 1
    Let me ask you this question - if you could use either mysql or PostgreSQL; which one would you use? The idea is that your problem may be better solved with one of them. Or it is possible that it doesn't matter - e.g. if you run a small blog - pick any of them. Same goes for single leader/multiple leader/leaderless systems. – AndrewR Jun 23 '22 at 19:57