2

Is there some timestamp/counter that can be used to validate that in a read-modify-write cycle, the data in the row did not change between reading and modifying?

In other words, can I read some kind of ID while reading the row, and when I write it back tell Cassandra what that ID was, and the write then fails if the ID changed since then? (Which amounts to saying that some other write took place after I read the data)

Sebastien Diot
  • 7,183
  • 6
  • 43
  • 85

1 Answers1

2

Each column in cassandra is a Tuple (or a triplet) that contains a name, value and a timestamp. The timestamp of the column represents the last time it was modified. If you have 100's of nodes, whichever node has an update with a the most recent timestamp will win. This is how Eventual Consistency is achieved.

zznate has a good presentation: Introduction to Apache Cassandra for Java Developers where this topic is referenced (slide 37)

Accessing timestamp of a Cassandra column

In summary, you don't need "some kind of ID" when you have the ability to retrieve the timestamp for a given column representing the last time it was modified. However, at scale, with 100's of nodes, how can you be sure that the node you are connecting to, has the most up to date column? (refer back to the zznate presentation)

Point is, you can't, without enabling transactions:

  1. Cassandra - transaction support
  2. Cassandra Transaction with ZooKeeper - Does this work?
  3. how to integrate cassandra with zookeeper to support transactions
  4. And many more: cassandra & transactions
Community
  • 1
  • 1
sdolgy
  • 6,963
  • 3
  • 41
  • 61
  • I'm not totally sure I understand. You are saying that the "ID" (timestamp) exists per column. So if read data, noted the timestamp of all "columns" involved as *input* and when I wrote the modified data back, asked Cassandra to check that those timestamp haven't changed, then I would have what I want, assuming the API would allow that. But you say it won't work. Is it because even if the nodes have the same data from the same insert/update, different nodes would have differing timestamps? That is the only thing that would make it *impossible*, IMHO. Otherwise, it's just a "missing feature". – Sebastien Diot Mar 28 '12 at 10:34
  • (Hit max comment size) I understood that modifications were transactional *within one row*. This is all I'm asking for. Read *one* row, modify it, and update *the same row*, making sure it hasn't changed, and fail if it has. Would this make it likely that half would accept the change, and the others would not, causing the data to become inconsistent? That might be the problem you are expecting. – Sebastien Diot Mar 28 '12 at 10:39
  • yes. you are correct. node A could have column A with a timestamp of xxxxyyyy and when you query it, it's correct -- however, node Z also has column A with a timestamp of xxxxyyyz (newer) but that change hasn't fully propogated to the other nodes where it's required based on specified replication factor – sdolgy Mar 28 '12 at 10:40