Failover and Replication in 2-node Cassandra cluster

Question

I run KairosDB on a 2-node Cassandra cluster, RF = 2, Write CL = 1, Read CL = 1. If 2 nodes are alive, client sends half of data to node 1 (e.g. metric from METRIC_1 to METRIC_5000) and the other half of data to node 2 (e.g. metric from METRIC_5001 to METRIC_10000). Ideally, each node always has a copy of all data. But if one node is dead, client sends all data to the alive node.

Client started sending data to the cluster. After 30 minutes, I turned node 2 off for 10 minutes. During this 10-minute period, client sent all data to node 1 properly. After that, I restarted node 2 and client continued sending data to 2 nodes properly. One hour later I stopped the client.

I wanted to check if the data which was sent to node 1 when node 2 was dead had been automatically replicated to node 2 or not. To do this, I turned node 1 off and queried the data within time when node 2 was dead from node 2 but it returned nothing. This made me think that the data had not been replicated from node 1 to node 2. I posted a question Doesn't Cassandra perform “late” replication when a node down and up again?. It seems that the data was replicated automatically but it was so slow.

What I expect is data in both 2 servers are the same (for redundancy purpose). That means the data sent to the system when node 2 is dead must be replicated from node 1 to node 2 automatically after node 2 becomes available (because RF = 2).

I have several questions here:

1) Is the replication truly slow? Or did I configure something wrong?

2) If client sends half of data to each node as in this question I think it's possible to lose data (e.g. node 1 receives data from client, while node 1 is replicating data to node 2 it suddenly goes down). Am I right?

3) If I am right in 2), I am going to do like this: client sends all data to both 2 nodes. This can solve 2) and also takes advantages of replication if one node is dead and is available later. But I am wondering that, this would cause duplication of data because both 2 nodes receive the same data. Is there any problem here?

Thank you!

score 3 · Accepted Answer · edited Aug 04 '15 at 02:06

3

Can you check the value of hinted_handoff_enabled in cassandra.yaml config file?

For your question: Yes you may lose data in some cases, until the replication is fully achieved, Cassandra is not exactly doing late replication - there are three mechanisms.

Hinted handoffs http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesHintedHandoff.html
Repairs - http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsRepair.html
Read Repairs - those may not help much on your use case - http://wiki.apache.org/cassandra/ReadRepair

AFAIK, if you are running a version greater than 0.8, the hinted handoffs should duplicate the data after node restarts without the need for a repair, unless data is too old (this should not be the case for 10 minutes). I don't know why those handoffs where not sent to your replica node when it was restarted, it deserves some investigation.

Otherwise, when you restart the node, you can force Cassandra to make sure that data is consistent by running a repair (e.g. by running nodetool repair).

By your description I have the feeling you are getting confused between the coordinator node and the node that is getting the data (even if the two nodes hold the data, the distinction is important).

BTW, what is the client behaviour with metrics sharding between node 1 and node 2 you are describing? Neither KairosDB nor Cassandra work like that, is it your own client that is sending metrics to different KairosDB instances?

The Cassandra partition is not made on metric name but on row key (partition key exactly, but it's the same with kairosDB). So every 3-weeks data for each unique series will be associated a token based on hash code, this token will be use for sharding/replication on the cluster. KairosDB is able to communicate with several nodes and would round robin between those as coordinator nodes.

I hope this helps.

edited Aug 04 '15 at 02:06

duong_dajgja

4,196
1
38
65

answered Aug 03 '15 at 15:59

Loic

1,088
7
19

"what is the client behaviour with metrics sharding between node 1 and node 2 you are describing?" -> I just want to do something like load balancing. "is it your own client that is sending metrics to different KairosDB instances?" -> in kairosdb.properties on node1 I just configured "kairosdb.datastore.cassandra.host_list=node1.hdsrcluster:9160" and in kairosdb.properties on node2 I just configured "kairosdb.datastore.cassandra.host_list=node2.hdsrcluster:9160". Then I run KairosDB by "kairosdb.sh start" on both 2 nodes. Am I doing right way? – duong_dajgja Aug 04 '15 at 02:08
OK I understand. Yes this is fine, your client that is pushing the data is doing the load balancing. That makes me ask the question: did you check that the data was correctly populated into Cassandra while node 2 as off? – Loic Aug 04 '15 at 09:04
Yes, it was. But, as I mentioned, it was very slow. Maybe I need some way faster or I need to run "nodetool repair" manually. BTW, let's forget the model of my system, could you please suggest me any system model of KairosDB + Cassandra that can handle failover (i.e. no matter if one server goes down) and "late" replication as the requirements? (i.e. if a server (e.g. node 2) goes down and goes up again, the "missed" data would be synchronized from node 1 to node 2 quickly. I need this, because if after that node 1 goes down I won't lose any data) – duong_dajgja Aug 04 '15 at 10:34
And, with 2 nodes only. – duong_dajgja Aug 04 '15 at 10:46
As I said, I'm surprised that you did not get the data replicated quickly after rebooting node 2 with the hinted handoffs. Is the feature enabled in cassandra.yaml? What version of Cassandra are you using? If it's enabled, it sounds like a question for Cassandra developers. – Loic Aug 04 '15 at 11:52
After I stopped client, I waited for about 2 hours then stopped node 1. I queried 600 data points with timestamp when node 2 was dead from node 2 and it returned ~500 data points. This means that some of data had been already replicated and some had not. After that, I turned on node 1, and run "nodetool repair", waited until this finished. Again, I turned off node 1 and queried data from node 2. This time node 2 returned all 600 data points. What would be problem here? Also, hinted handoffs was enabled, and hint window was set to 30 hours. – duong_dajgja Aug 04 '15 at 13:20
I don't know.... I think it's a question for Cassandra experts, as far as I understand them, hinted handoffs should be sent immediately to the node when it's back on the cluster. BY default the rate is 1Mbit per second, so it should be very quick to replicate your 600 points. – Loic Aug 04 '15 at 14:06
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/85129/discussion-between-duong-dajgja-and-loic). – duong_dajgja Aug 04 '15 at 16:46
I found that if the GC grace period in cassandra.yaml is set to 0, this disables hinted handoffs... Maybe it would be interesting to check at this parameter. – Loic Aug 18 '15 at 19:45

Failover and Replication in 2-node Cassandra cluster

1 Answers1