I have one doubt regarding partition distribution in Cassandra.
My problem is that my partitions are not even-sized, and some of the partitions are more accessed than others, so I'm afraid I'll have a hot spot in some partitions sooner or later.
For example:
- I've two partitions: A and B.
- Size of A is 10, size of B is 5.
- read the full A partition twice the times I read B.
- have three (1, 2, and 3) nodes, with replication factor 2.
Results:
- Node 1 (A) Node 2 (B, A) Node 3 (B)
- Node 1 size is 10, read 1.0
- Node 2 size is 15, read 1.5
- Node 3 size is 5, read 0.5
My nodes 1 and 2 are overloaded.
I started researching about my problem, and I found the Virtual Nodes concept, but I'm not too sure about what it actually means.
Will a single partition key be assigned to different virtual nodes (1 partition key -> n token ranges)?
One partition key can only be stored in a virtual node?
I have to partition my keys adding some partition info (like a random % 10 or something) or there's a way to make Cassandra do it automatically?