1

I am using Cassandra 2.2.4.In these i have a table with replication factor 3 but i have only 2 node .The used disk space of these 2 nodes is different(1st node has 10 GB size and 2nd node has 14 GB ) . What is the reason for these difference.

Can anyone please help me?

Ajmal Sha
  • 906
  • 4
  • 17
  • 36
  • Whats the reason behind your RF3 for 2 nodes? And used disk spaces won't be same always, but in your case, may be your second node is getting hotspot. – Anower Perves Feb 28 '17 at 07:36
  • Actually we need 3 replication that's why we put RF as 3 with available 2 node initially and decide to add one extra node in future . – Ajmal Sha Feb 28 '17 at 08:39
  • If you need RF3 then you can alter to RF3 later after adding a new node. Not necessary to add now. For two node use RF2 max. Cause you won't be able to store the third replicated data to any node. – Anower Perves Feb 28 '17 at 08:46
  • That means the reason behind the disk space difference is not related to difference in node no and RF – Ajmal Sha Feb 28 '17 at 09:01
  • Yes. Its not related. I would suggest you to wait bit. If you see that the difference of usage between two nodes are increasing then its confirmed that your second node is Hot Spot. You can take some help from [this](http://stackoverflow.com/questions/29772159/cassandra-uneven-partitions-and-hotspots) post. – Anower Perves Feb 28 '17 at 09:05
  • OK thanks . one more doubt , can you please help ? what is hot spot and why my second nod act as a hot spot. – Ajmal Sha Feb 28 '17 at 09:54
  • 1
    In distributed system, distribution of data should be almost equally, But when a single node storing data more and more and other nodes are storing less then the node which storing massive data is called under Hot Spot. There may be several reason for hot spot. But if it occurs, we and the node is less dependent, we usually remove the node from the cluster using **nodetool decommission** and add later. If removing the will hamper your consistency, then we add another node to remove the hot spot node to maintain consistency. – Anower Perves Feb 28 '17 at 10:05
  • So the hot spot occurs only if RF less than the node? Is the additional data in the hot spotted node is also there in other nodes? – Ajmal Sha Feb 28 '17 at 10:30
  • Could be. Add the node you already decided to add or reduce RF. – Anower Perves Feb 28 '17 at 10:40

1 Answers1

3

Even if you had replication factor 1, the disk space might have been different still. This is because some partitions are stored in one node, and others in the other.
If you have more data belonging to partition A, then the node that has partition A will have more data.
The partition is determined from the primary key. This is why it's so important to have a good primary key. You can watch the tutorials on the datastax website for details on how to choose the best data model and primary key: https://academy.datastax.com/courses .

timeFly
  • 133
  • 1
  • 11
  • OK thanks. One more doubt, is there is any chance to store two replicas into same node? – Ajmal Sha Feb 28 '17 at 12:42
  • No, 2 replicas cannot be on the same node. In your case, even though you have replication factor of 3, the data is replicated only 2 times, one replica on each node. There won't be a node with a 3rd replica. – timeFly Mar 01 '17 at 16:15
  • 1
    This is also related to consistency level. For now I guess you have consistency level 1, which I think is default, so everything works. If you put consistency level ALL, and a replication factor greater than the number of nodes, I think you will get an exception, because it will try to copy the data to a nonexistent 3rd node. – timeFly Mar 01 '17 at 16:21