Cassandra | How can I compare the current set of data with the previous one?

Question

I am new to Cassandra DB and I have a need to store a set of data in a table periodically (in every 15 minutes). This set of data can be of 1500 records. Now, I have to insert this set of data in Cassandra table in such a way that all these 1500 records are tied with the same partition key, meaning all these 1500 records must be present in the same node.

After 15 minutes, again a batch of 1500 records will have to be stored in the same fashion, but a different partition key.

The GOAL is to compare last two sets of data and find the ones with the differences. So the 1500 records (now) will be compared to 1500 records (previous) and I need to find out which ones have changed and then do some business logic on the changed ones.

If I use timeuuid as the partition key then all my 1500 records will have a different timeuuid and thus will not be present in the same node.

I was searching about maintaining incremental counters in Cassandra but seems like there is no good way, and besides that maintaining a COUNTER table in a single node is an anti-pattern to distributed design.

How to create auto increment IDs in Cassandra

Can you guys please suggest me the optimal way to solve this problem ?

In simpler words, my requirements comes down to :

How can I compare the current set of data with the previous one ?

By the way, I will be using Springboot to Connect and write data to Cassandra.

Thanks in advance !

Using a timestamp will guarantee a different partition key but I must ask why you need the data to be on the same node? The very nature of cassandra is you dont care where the data is, just that it is in the cluster somewhere and you can retrieve it. — markc, Jan 15 '19 at 09:04

Cassandra | How can I compare the current set of data with the previous one?

0 Answers0