Cassandra : Batch write optimisation

Question

I get bulk write request for let say some 20 keys from client. I can either write them to C* in one batch or write them individually in async way and wait on future to get them completed.

Writing in batch does not seem to be a goo option as per documentation as my insertion rate will be high and if keys belong to different partitions co-ordinators will have to do extra work.

Is there a way in datastax java driver with which I can group keys which could belong to same partition and then club them into small batches and then do invidual unlogged batch write in async. IN that way i make less rpc calls to server at the same time coordinator will have to write locally. I will be using token aware policy.

folex · Accepted Answer · 2016-08-17T09:05:47.487

Your idea is right, but there is no built-in way, you usually do that manually.

Main rule here is to use TokenAwarePolicy, so some coordination would happen on driver side. Then, you could group your requests by equality of partition key, that would probably be enough, depending on your workload.

What I mean by 'grouping by equality of partition key` is e.g. you have some data that looks like

MyData { partitioningKey, clusteringKey, otherValue, andAnotherOne }

Then when inserting several such objects, you group them by MyData.partitioningKey. It is, for all existsing paritioningKey values, you take all objects with same partitioningKey, and wrap them in BatchStatement. Now you have several BatchStatements, so just execute them.

If you wish to go further and mimic cassandra hashing, then you should look at cluster metadata via getMetadata method in com.datastax.driver.core.Cluster class, there is method getTokenRanges and compare them to result of Murmur3Partitioner.getToken or any other partitioner you configured in cassandra.yaml. I've never tried that myself though.

So, I would recommend to implement first approach, and then benchmark your application. I'm using that approach myself, and on my workload it works far better than without batches, let alone batches without grouping.

How do I batch keys to go to same node ? - does data stax client expose the tokens belonging to every node, so tha i can murmurhash and then group them ? — Peter, Aug 16 '16 at 12:18
can you explain "you could group your requests by equality of partition key" a bit more? I did not understand this part — Peter, Aug 16 '16 at 18:29
I think Hbase driver do this internally datastax should incorporate this incase of bulk queries — Peter, Sep 24 '16 at 04:48

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

Logged batches should be used carefully in Cassandra becase they impose additional overhead. It also depends on the partition keys distribution. If your bulk write targets a single partition then using Unlogged batch results in a single insert operation.

In general, writing them invidually in async manner seems to be a good aproach as pointed here: https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-the-nuanced-edition-dd78d61e9885

You can find sample code on the above site how to handle multiple async writes: https://gist.github.com/rssvihla/26271f351bdd679553d55368171407be#file-bulkloader-java https://gist.github.com/rssvihla/4b62b8e5625a805583c1ce39b1260ff4#file-bulkloader-java

EDIT:
please read this also: https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/#14

What does a single partition batch cost?

There’s no batch log written for single partition batches. The coordinator doesn’t have any extra work (as for multi partition writes) because everything goes into a single partition. Single partition batches are optimized: they are applied with a single RowMutation [10].

In a few words: single partition batches don’t put much more load on the server than normal writes.

What does a multi partition batch cost?

Let me just quote Christopher Batey, because he has summarized this very well in his post “Cassandra anti-pattern: Logged batches” [3]:

Cassandra [is first] writing all the statements to a batch log. That batch log is replicated to two other nodes in case the coordinator fails. If the coordinator fails then another replica for the batch log will take over. [..] The coordinator has to do a lot more work than any other node in the cluster.

Again, in bullets what has to be done:

serialize the batch statements

write the serialized batch to the batch log system table

replicate of this serialized batch to 2 nodes

coordinate writes to nodes holding the different partitions

on success remove the serialized batch from the batch log (also on the 2 replicas)

Remember that unlogged batches for multiple partitions are deprecated since Cassandra 2.1.6

Cassandra : Batch write optimisation

2 Answers2