138

I've heard about two kind of database architectures.

  • master-master

  • master-slave

Isn't the master-master more suitable for today's web cause it's like Git, every unit has the whole set of data and if one goes down, it doesn't quite matter.

Master-slave reminds me of SVN (which I don't like) where you have one central unit that handles thing.

Questions:

  1. What are the pros and cons of each?

  2. If you want to have a local database in your mobile phone like iPhone, which one is more appropriate?

  3. Is the choice of one of these a critical factor to consider thoroughly?

never_had_a_name
  • 90,630
  • 105
  • 267
  • 383
  • 2
    CAP Theorem - > Consistency Availability Partition Tolerance states that you cannot have all three together. Depending on the application you can chose either one. – Pritam Banerjee Jul 25 '19 at 07:35
  • @PritamBanerjee, we can choose two out of three; not one of three. So, we can either have CA or CP or AP. – user674669 Jun 15 '23 at 16:40

2 Answers2

116

While researching the various database architectures as well. I have compiled a good bit of information that might be relevant to someone else researching in the future. I came across

  1. Master-Slave Replication
  2. Master-Master Replication
  3. MySQL Cluster

I have decided to settle for using MySQL Cluster for my use case. However please see below for the various pros and cons that I have compiled

1. Master-Slave Replication

Pros

  • Analytic applications can read from the slave(s) without impacting the master
  • Backups of the entire database of relatively no impact on the master
  • Slaves can be taken offline and sync back to the master without any downtime

Cons

  • In the instance of a failure, a slave has to be promoted to master to take over its place. No automatic failover
  • Downtime and possibly loss of data when a master fails
  • All writes also have to be made to the master in a master-slave design
  • Each additional slave add some load to the master since the binary log have to be read and data copied to each slave
  • Application might have to be restarted

2. Master-Master Replication

Pros

  • Applications can read from both masters
  • Distributes write load across both master nodes
  • Simple, automatic and quick failover

Cons

  • Loosely consistent
  • Not as simple as master-slave to configure and deploy

3. MySQL Cluster

The new kid in town based on MySQL cluster design. MySQL cluster was developed with high availability and scalability in mind and is the ideal solution to be used for environments that require no downtime, high avalability and horizontal scalability.

See MySQL Cluster 101 for more information

Pros

  • (High Avalability) No single point of failure
  • Very high throughput
  • 99.99% uptime
  • Auto-Sharding
  • Real-Time Responsiveness
  • On-Line Operations (Schema changes etc)
  • Distributed writes

Cons

You can visit for my Blog full breakdown including architecture diagrams that goes into further details about the 3 mentioned architectures.

Alexey
  • 9,197
  • 5
  • 64
  • 76
Skillachie
  • 3,176
  • 1
  • 25
  • 26
  • 2
    Can you also write something about Galera? Percona XtraDB Cluster? – Sybil Jun 21 '18 at 12:52
  • "Application might have to be restarted" as part of cons. What does it mean? – Lily Feb 23 '19 at 21:23
  • 1
    If you have to change the IP of the DB server then it will need to be configured in the application as well to read from the new elected master. As a result you might need to restart your app to pick up the new configuration settings. It all depends on your current setup. You could also use a floating IP to bypass this. Just to give you a general idea – Skillachie Feb 25 '19 at 20:57
100

We're trading off availability, consistency and complexity. To address the last question first: Does this matter? Yes very much! The choices concerning how your data is to be managed is absolutely fundamental, and there's no "Best Practice" dodging the decisions. You need to understand your particular requirements.

There's a fundamental tension:

One copy: consistency is easy, but if it happens to be down everybody is out of the water, and if people are remote then may pay horrid communication costs. Bring portable devices, which may need to operate disconnected, into the picture and one copy won't cut it.

Master Slave: consistency is not too difficult because each piece of data has exactly one owning master. But then what do you do if you can't see that master, some kind of postponed work is needed.

Master-Master: well if you can make it work then it seems to offer everything, no single point of failure, everyone can work all the time. The trouble with this is that it is very hard to preserve absolute consistency. See the wikipedia article for more.

Wikipedia seems to have a nice summary of the advantages and disadvantages

Advantages

  • If one master fails, other masters will continue to update the database.

  • Masters can be located in several physical sites i.e. distributed across the network.

Disadvantages

  • Most multi-master replication systems are only loosely consistent, i.e. lazy and asynchronous, violating ACID properties.

  • Eager replication systems are complex and introduce some communication latency.

  • Issues such as conflict resolution can become intractable as the number of nodes involved rises and the required latency decreases.

Trevor
  • 11,269
  • 2
  • 33
  • 40
djna
  • 54,992
  • 14
  • 74
  • 117
  • CouchDB uses MVCC. Does this sort of handle the consistency problem faced upon multiple masters cause when one i brought online again, the versioning system handles the consistency and this master will get the correct updated data. – never_had_a_name Sep 18 '10 at 04:00
  • 16
    But what happens when two users do something contradictory - like two users attempt to buy the last item in stock? Imagine a scenario where we have two masters and each user is hitting a different master, then we get some sort of conmmunications glitch - in the end there will either be a compromise of integrity, or reduced availability - one user get's told "sorry mate, I really don't know what's happening until I talk to the other master", or we have a nasty conflic when comms are restored - and those can get really complicated. – djna Sep 18 '10 at 06:42
  • 3
    What do financial trading or stock markets use? They would be hitting this problem all the time? – CMCDragonkai Jun 20 '14 at 05:50
  • 4
    Where you need a single, updating, "truth" (as in financial systems) you need Master/Slave or indeed just Master. Where you can patch up the truth later (think merge conflicts in a revision control system like Git) then you can use Master/Master. – djna Jun 21 '14 at 05:30
  • djna makes a very salient observation. The database now has to have some sort of "tiebreaker" logic. What is most important? The most "recent" data? That makes sense if you are re-writing a field, but it doesn't make sense if you are doing a "counter" and you need all processes to increment (or decrement) before returning a result. Especially so you don't sell out-of-stock items. If you had a network partition, what happens when it comes back together? All of this is CAP theorum stuff. This is also where you can have algorithms like Paxos, to develop consensus between different machines. – Peter Corless Apr 27 '15 at 22:39
  • This is a very helpful answer. One thing I don't quite understand is that since Master-Master replication has so much advantages over Master-Slave replication, why are companies still doing Master-Slave replication? – Stanleyrr Jun 12 '18 at 03:41
  • Stanleyrr, perhaps my answer didn't say it loud enough: Master-Master is hard to get right. Usually there is some sacrifice of consistency, there is some "eventual consistency" pattern. That's fine for some classes of business problems, but where the numbers matter (bank balances, inventories) you need to think very hard about what level of inconsistency you can deal with. Life is much simpler with just one copy of the truth. Study that final "Disadvantages" section. – djna Jun 12 '18 at 05:08
  • @djna if in cases where ACID is a must, such as financial applications as mentioned, how are they able to scale? distributed transactions and resorting to 2PC? – kibe Mar 15 '22 at 00:07
  • 1
    @kibe distributed transactions and 2PC classically are needed that when we have related, but separate, "truths" in two otherwise independent systems and need to update both truths for consistency. Here we are talking about keeping two copies of the same truth hence allowing work to be distributed across multiple machines. Master-Master system will use some 2PC-style approach if absolute consistency is required. The problem: in some rare failure scenarios we lose availability of both systems until in-doubt transactions resolve. Consistency, Availability, Partitioning - pick 2. – djna Mar 15 '22 at 05:43