24

Does Apache Cassandra support sharding?

Apologize that this question must seem trivial, but I cannot seem to find the answer. I have read that Cassandra was partially modeled after GAE's Big Table which shards on a massive scale. But most of the documentation I'm currently finding on Cassandra seems to imply that Cassandra does not partition data horizontally across multiple machines, but rather supports many many duplicate machines. This would imply that Cassandra is a good fit high availability reads, but would eventually break down if the write volume became very very high.

Chris Dutrow
  • 48,402
  • 65
  • 188
  • 258

2 Answers2

33

Cassandra does partition across nodes (because if you can't split it you can't scale it). All of the data for a Cassandra cluster is divided up onto "the ring" and each node on the ring is responsible for one or more key ranges. You have control over the Partitioner (e.g. Random, Ordered) and how many nodes on the ring a key/column should be replicated to based on your requirements.

This contains a pretty good overview. Basic architecture

Also, I highly recommend reading the Dynamo white paper. While Cassandra is different than Dynamo in many ways, conceptually they stem from the same roots. Check it out: Dynamo White Paper

Matt Self
  • 19,520
  • 2
  • 32
  • 36
  • Ok, key question: Can Cassandra be queried using greater than and less than operators in Olog(n) time? – Chris Dutrow May 08 '13 at 17:39
  • 2
    This depends on whether you've used Random or Ordered Partitioner. Random Partitioner will evenly distribute across nodes, so it is possible that a range query would need to hit most/all nodes to retrieve the data... so perhaps O(n). With Ordered Partitioner Cassandra can determine exactly which nodes to query and return everything on the ring in between, but this is done at the cost of even data distribution (i.e. hello hotspots). There are ways to accomplish range queries (e.g. build your own index where your row key is a column). This warrants another question/discussion in itself. – Matt Self May 08 '13 at 20:16
  • Does the partitioned data get replicated on all the nodes (the ones setup for replication)? – user3587180 May 08 '17 at 18:24
  • 1
    @MattSelf Basic architecture URL is broken in your post. Base URL isn't active either, please edit the post is possible, TY. – realPK Nov 29 '18 at 22:06
-5

yes, cassandra supports sharding, but in its own way.

In Mongodb each secondary node contains full data of primary node but in Cassandra, each secondary node has responsibility of keeping only some key partitions of data.

Matt Self
  • 19,520
  • 2
  • 32
  • 36
khanmizan
  • 926
  • 9
  • 17
  • 9
    You are conflating MongoDB [*replication*](http://docs.mongodb.org/manual/replication/) (where secondaries contain a full copy of the data for redundancy) with [*sharding*](http://docs.mongodb.org/manual/sharding/) (partitioning of a logical database across a cluster of machines). Generally if you are sharding you would also want to have each shard backed by a replica set, but the two concepts are in fact orthogonal. – Stennie May 08 '13 at 04:46