3

I have one doubt regarding partition distribution in Cassandra.

My problem is that my partitions are not even-sized, and some of the partitions are more accessed than others, so I'm afraid I'll have a hot spot in some partitions sooner or later.

For example:

  1. I've two partitions: A and B.
  2. Size of A is 10, size of B is 5.
  3. read the full A partition twice the times I read B.
  4. have three (1, 2, and 3) nodes, with replication factor 2.

Results:

  • Node 1 (A) Node 2 (B, A) Node 3 (B)
  • Node 1 size is 10, read 1.0
  • Node 2 size is 15, read 1.5
  • Node 3 size is 5, read 0.5

My nodes 1 and 2 are overloaded.

I started researching about my problem, and I found the Virtual Nodes concept, but I'm not too sure about what it actually means.

Will a single partition key be assigned to different virtual nodes (1 partition key -> n token ranges)?

One partition key can only be stored in a virtual node?

I have to partition my keys adding some partition info (like a random % 10 or something) or there's a way to make Cassandra do it automatically?

albertredneck
  • 96
  • 2
  • 7
  • Give a look here to better understand: http://stackoverflow.com/questions/25615978/not-quite-clear-about-a-cassandras-anti-pattern/25617670#25617670 – Carlo Bertuccini Apr 21 '15 at 14:10

1 Answers1

2

Will a single partition key be assigned to different virtual nodes (1 >partition key -> n token ranges)?

No. Each partition key will be mapped to only one virtual node and it's replicas.

To avoid hotspots, it is useful to add a sharding key (random number % n) to the partition key. Otherwise try choosing your partition key such that it does not cause hotspots.