0

Does federation lead to horizontal scaling? We know that relational databases usually are scaled vertically as horizontal scaling is not feasible and can disrupt the entire databases. One of the main reasons why it is not possible as we tend to have join's in the queries and hence all the tables need to be present in the same server. However I came across the topic federation under relational database, it states that federation is the process of splitting a database by functions. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag.

So my question is "Is federation a means of achieving horizontal scaling in RDMS"?

Rosy Roy
  • 3
  • 2

1 Answers1

0

Federation as you describe would split data over more servers, but probably not evenly. That is, in your example of forums, users, and products, I expect there are more users than products, and more forum messages than users. If federation is your only means of horizontal scaling, then it's only a matter of time (very short time) before one of those exceeds the capacity of its server.

Horizontal scaling by sharding is more common, because it gives you the opportunity to split data in a manner that is closer to equal portions per server.

You correctly point out that joins make sharding pretty hard.

You can still join on a single database server if the rows that are related reside on the same server. For example, if you had a table orders and a table lineitems that has a one-to-many relationship with orders, you could colocate the lineitems on the same server with its parent order record.

In a relational database, the relationships will get more complex very quickly. In your users and forums example, it's a many-to-many relationship. A given forum has messages from many users, and a given user may post messages on many forums. So how do you ensure the related rows are colocated? You can't.

So some types of relationships (notably many-to-many relationships) can't be sharded in a way that allows joins on the same server.

You can then choose one of these difficult options:

  • Bear with the fact that some joins must be resolved by fan-out queries that fetch rows from multiple shards, and resolve the join in the application.

  • Prohibit many-to-many relationships in your data model.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Taking a small example of many to many relationship like, customer and products. A customer can buy many product, Similary many products can be bought by a customer. How can such many to many relationships be prohibited. Taking this example further, if I have a RDBMS like MySQL and I can no longer vertically scale it, I have to either switch to noSQL like mongoDB or else horizontally scale it. How can i go ahead with the second option of horizontally scaling it. – Rosy Roy Aug 19 '22 at 17:25
  • Adding to it, Incase I have a query such that find all the product with ID 1 which is bought by customer XYZ, In such cases I will have to run the query across all the servers and then the aggregate will be my solution. Isnt this making the entire process very inefficient? Is there any other method to acheive a better solution. Tables : Customer in server 1 and Products in server 2. – Rosy Roy Aug 19 '22 at 17:27
  • Yes, that's called a fan-out query, when it must search every server in a shard set. You're right, it is less efficient to do that. You need to design your data model so the queries you need to be more efficient are more likely to be single-shard queries, so the uncommon queries are the ones that sacrifice efficiency. It's a hard choice! – Bill Karwin Aug 19 '22 at 18:07
  • The answer to your question about how to prohibit many-to-many queries is: _denormalization_. You sacrifice the benefits of relational normalization. This is the solution of all NoSQL type databases. – Bill Karwin Aug 19 '22 at 18:08