0

This is a design level question,

I have a node setup like Node N1, N2 and N3 where my application and database (as of now consider as Cassandra) runs in all 3 nodes.

I need to provide the data consistency for the following scenario, Could someone provide answers?

  • Thread (T1) tries to edit the data in Node N1
  • Thread (T2) tries to edit the same data from Node N2
  • Only one write should succeed

In this case, what will happen in Cassandra?

Is there a way to provide the concurrency via application / Cassandra database? Or any Algorithms?

Apart from LWT in Cassandra.

Oresztesz
  • 2,294
  • 1
  • 15
  • 26
Harry
  • 3,072
  • 6
  • 43
  • 100
  • 1
    Cassandra breaks up a row in columns. If there are multiple updates on the same row but on different columns, there will be no problem since the updates will be independent. The problem appears when the threads change the same column cell. In this case, last write wins. Each mutation (update, delete) has a timestamp associated with it and Cassandra will pick the most recent timestamp. Reading these might help https://stackoverflow.com/questions/34898693/why-cassandra-cluster-need-synchronized-clocks-between-nodes and https://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks – Horia Nov 23 '17 at 09:14
  • Thanks @Horia, I understand that but I would like to know Is there any better technique to handle it in application to provide concurrency? – Harry Nov 23 '17 at 10:39
  • Why doesn't LWT work for you? – Simon Fontana Oscarsson Nov 23 '17 at 11:19
  • It has a lot of peformance issue in write and It wont support partition. Do you have any suggestion on application layer concurrency? – Harry Nov 23 '17 at 12:12
  • 1
    @Harry As long as you only read/write to one partition it will not have any big performance issues. So in your case where you only write one value it's good. So what you want is exactly what Oresztesz answered below. You want to use CL=QUORUM as the majority of nodes will always have the latest value. That paired with LWT will make sure the data doesn't change while writing. – Simon Fontana Oscarsson Nov 23 '17 at 14:43
  • @SimonFontanaOscarsson awesome, I asked a question as a reply to Oresztesz – Harry Nov 23 '17 at 14:47

1 Answers1

1

Cassandra offers tunable consistency. In your case this only means, that if you offer CL=QUORUM for writes it will get synced to 2 out-of 3 nodes. Read will be consistent with CL=QUORUM as you will get results from 2 out-of 3 nodes, so there's an overlap.

For writes Cassandra offers last-write-wins mechanism. This means that independently from consistency level a reader will either see T1 or T2 thread's write, depending on when the read happens. Later on reader will only see the latest write.

If you want locking mechanism, you can use offline concurrency patterns in your application layer, like optimistic or pessimistic offline lock. Some of the persistency management frameworks offer these pattern implementation out-of-the-box.

Oresztesz
  • 2,294
  • 1
  • 15
  • 26
  • Thanks for your points, Could you answer the two questions, I am not able to get the following two operations, 1) Let's say Node N1, Node N2 and Node N3 are my node setup with Replication Factor as '3' with Write Consistency as '3' and Read consistency as '1' what will happen If the write is on going like (say it successfully completed in Node 1 and on going in Node 2). – Harry Nov 23 '17 at 15:03
  • NOTE : I know it works by writing in Quorum, like writing in Node N1 quorum and replicating the quorum in Node N2 and then to Node N3. During the on going write, If there is another READ / WRITE happens for the same row will the LTW solves the issue and how? Example : If I use the update with IF statement, will it block all the writes and reads on the nodes? 2) Does IF statement is the only indication for LWT – Harry Nov 23 '17 at 15:03
  • @SimonFontanaOscarsson – Harry Nov 23 '17 at 15:05
  • @Harry I'm not sure I got your first question. There is nothing called write consistency or read consistency. Consistency level (CL) is something you set on the session for the client side. So if you want CL 3 for writes and CL 1 for reads that is something you have to change on client before each operation. I also don't think CL 3 or ALL is a good CL in most cases because that will give you less availability in Cassandra. A common misconception is that CL means you only read/write on that many nodes. That is true for reads but NOT for writes. Cassandra will always try to write to all nodes – Simon Fontana Oscarsson Nov 23 '17 at 23:33
  • but as soon as CL of nodes respond successful the operation will be successful. Imagane you have CL ALL for a write operation and one node goes down or becomes heavily loaded, Now you can't write any data at all. Cassandra has other mechanics to make sure data that is missing will still get replicated when that node becomes available, this is done with hints and repair. These are important concepts that requires some understanding. – Simon Fontana Oscarsson Nov 23 '17 at 23:38
  • @SimonFontanaOscarsson, I should be able to set WC as 3 and RC as 1 in client side for every operation, http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html , I know about hinted handoff, I want to make sure If there is a heavy load and write is on going in node 2, During that time If another write / read comes from different thread for the same row, How cassandra will react? – Harry Nov 23 '17 at 23:41
  • Also does it mean, just using the IF statement in query will initiate LWT ? – Harry Nov 23 '17 at 23:42
  • @Harry What I meant with by saying WC and RC doesn't exist is because there is no actual command in cql or in driver that gives CL for only read or writes, you can only set CL for all operations afaik. Anyway, for LWT you actaully have to set CL SERIAL for all LWT operations. You also have to make sure that those column writes are only gonna have CL SERIAL or else regular writes can change a value that is part of a LWT condition at a time between evaluating the condition and applying the update. I think Stefan Podkowinski can explaint this better: – Simon Fontana Oscarsson Nov 23 '17 at 23:56
  • https://stackoverflow.com/questions/34790674/what-are-the-implications-of-using-lightweight-transactions – Simon Fontana Oscarsson Nov 23 '17 at 23:56
  • If you have any more questions you can drop them here and I'll try to answer. If you want to know exactly how things work in LWT I would recommend asking your questions on the user mailing list. It is very active and many devs check there daily. http://cassandra.apache.org/community/ – Simon Fontana Oscarsson Nov 23 '17 at 23:59
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/159716/discussion-between-harry-and-simon-fontana-oscarsson). – Harry Nov 24 '17 at 00:52
  • I mailed in community, It is not received yet in the forum. Also One quick question, Does LWT lock on ROW / COLUMN / the WHOLE TABLE? – Harry Nov 24 '17 at 01:31
  • @Harry - LWT does not involve locking. See: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlTransactionsDiffer.html?hl=transaction – Oresztesz Nov 24 '17 at 11:26