2

I’m designing a SQL data schema with many tables that have a compound primary key of (customer_id, id). The application will frequently need to run JOIN queries to assemble data. However, it should never run cross customer joins. (Strictly disallowing cross customer joins could be a useful security feature, but business requirements might change.)

I’m looking at distributed SQL databases. Ideally I’d like to know that data across tables for the same customer_id lives on the same shard so joins don’t need to be executed cross-shard.

I’ve read through the documentation of Vitess, YugabyteDB, and CockroachDB. Of the three I only feel confident that Vitess will store data from the same customer on the same node.

Am I missing a feature of YugabyteDB/CockroachDB?

Calebmer
  • 2,972
  • 6
  • 29
  • 36
  • 1
    YugabyteDB [blog post](https://blog.yugabyte.com/distributed-sql-essentials-sharding-and-partitioning-in-yugabytedb/) on how it distributes data reads *"The primary key or index column definition defines the sharding method, with the HASH or range (ASC or DESC) sharding. This is where we choose to distribute data on the hash column to load balance and avoid hot points. It’s also where we keep rows together on the ascending or descending columns for range scans (like ‘<‘, >’, ‘BETWEEN’ predicates)"* which suggests that the primary key determines data distribution amongst shards. – JNevill Jan 04 '22 at 14:32
  • 1
    However it also mentions, of its distribution architecture *"However, they don’t have the burden of managing them as multiple databases because the SQL layer is above. Joins and secondary indexes are not processed at this level because this prevents cross-shard transactions"* suggesting that for your `JOIN` scenario, it's distribution/sharding logic is superfluous since it doesn't run SQL at the shard/tablet level. – JNevill Jan 04 '22 at 14:33
  • 1
    This is in the works in YugabyteDB as "colocated databases" https://docs.yugabyte.com/latest/architecture/docdb-sharding/colocated-tables/ Along with tablegroups which is also in the works https://github.com/yugabyte/yugabyte-db/blob/d28d82e8c48bb068b538a3ea0e5165525e46838d/architecture/design/ysql-tablegroups.md – dh YB Jan 04 '22 at 14:37
  • Reading about YugabyteDB collocated tables now, thanks. From an initial read I don’t see how collocated tables could be horizontally sharded based on a shared partition keg column? Seems like they are a solution for storing small non-sharded tables together. – Calebmer Jan 04 '22 at 14:50
  • 1
    @Calebmer say you have 10K tenants. You create 100 tablegroups and distribute the tenants over those 100 groups. Now you can spread tablegroups across the whole cluster. Then you can maybe shard a single tablegroup to be spread across multiple servers. – dh YB Jan 04 '22 at 17:08

2 Answers2

1

there are two aspects in that: security and performance. For security (tenant isolation) you don't need to dedicate nodes if you have Row Level Security. Here is an example on YugabyteDB (distributed SQL database): https://dev.to/yugabyte/hyper-scale-multi-tenant-for-saas-an-example-with-pgbench-33a3 For performance (avoid cross-node transactions) you can use declarative partitioning on top of it. The partitions, like created in the blog post, can be tagged to specific nodes through cloud/region/zone topology

FranckPachot
  • 414
  • 4
  • 10
1

CockroachDB does not support collocated tables as requested here. This was previously supported in the form of interleaved tables but was deprecated recently as the implementation was hard to maintain and did not yield significant performance wins as expected. The performance aspects are discussed at length on this GH thread https://github.com/cockroachdb/cockroach/issues/52009 if you are interested in learning more.

Rima
  • 61
  • 4