I run KairosDB on a 2-node Cassandra cluster, RF = 2, Write CL = 1, Read CL = 1. If 2 nodes are alive, client sends half of data to node 1 (e.g. metric from METRIC_1 to METRIC_5000) and the other half of data to node 2 (e.g. metric from METRIC_5001 to METRIC_10000). Ideally, each node always has a copy of all data. But if one node is dead, client sends all data to the alive node.
Client started sending data to the cluster. After 30 minutes, I turned node 2 off for 10 minutes. During this 10-minute period, client sent all data to node 1 properly. After that, I restarted node 2 and client continued sending data to 2 nodes properly. One hour later I stopped the client.
I wanted to check if the data which was sent to node 1 when node 2 was dead had been automatically replicated to node 2 or not. To do this, I turned node 1 off and queried the data within time when node 2 was dead from node 2 but it returned nothing. This made me think that the data had not been replicated from node 1 to node 2. I posted a question Doesn't Cassandra perform “late” replication when a node down and up again?. It seems that the data was replicated automatically but it was so slow.
What I expect is data in both 2 servers are the same (for redundancy purpose). That means the data sent to the system when node 2 is dead must be replicated from node 1 to node 2 automatically after node 2 becomes available (because RF = 2).
I have several questions here:
1) Is the replication truly slow? Or did I configure something wrong?
2) If client sends half of data to each node as in this question I think it's possible to lose data (e.g. node 1 receives data from client, while node 1 is replicating data to node 2 it suddenly goes down). Am I right?
3) If I am right in 2), I am going to do like this: client sends all data to both 2 nodes. This can solve 2) and also takes advantages of replication if one node is dead and is available later. But I am wondering that, this would cause duplication of data because both 2 nodes receive the same data. Is there any problem here?
Thank you!