Cassandra uneven partitions and hotspots

Question

I have one doubt regarding partition distribution in Cassandra.

My problem is that my partitions are not even-sized, and some of the partitions are more accessed than others, so I'm afraid I'll have a hot spot in some partitions sooner or later.

For example:

I've two partitions: A and B.
Size of A is 10, size of B is 5.
read the full A partition twice the times I read B.
have three (1, 2, and 3) nodes, with replication factor 2.

Results:

Node 1 (A) Node 2 (B, A) Node 3 (B)
Node 1 size is 10, read 1.0
Node 2 size is 15, read 1.5
Node 3 size is 5, read 0.5

My nodes 1 and 2 are overloaded.

I started researching about my problem, and I found the Virtual Nodes concept, but I'm not too sure about what it actually means.

Will a single partition key be assigned to different virtual nodes (1 partition key -> n token ranges)?

One partition key can only be stored in a virtual node?

I have to partition my keys adding some partition info (like a random % 10 or something) or there's a way to make Cassandra do it automatically?

Give a look here to better understand: http://stackoverflow.com/questions/25615978/not-quite-clear-about-a-cassandras-anti-pattern/25617670#25617670 — Carlo Bertuccini, Apr 21 '15 at 14:10

score 2 · Answer 1 · answered Apr 23 '15 at 19:44

Will a single partition key be assigned to different virtual nodes (1 >partition key -> n token ranges)?

No. Each partition key will be mapped to only one virtual node and it's replicas.

To avoid hotspots, it is useful to add a sharding key (random number % n) to the partition key. Otherwise try choosing your partition key such that it does not cause hotspots.

Cassandra uneven partitions and hotspots

1 Answers1

Linked