We have an application which uses Cassandra as data store. For easy access, same data need to be stored in multiple tables with different partition keys. For storing data into multiple tables BatchStatements are used. Reason for using batch statement is to make sure the data is written to all or none.
With this set up, recently we started seeing lot of write timeout errors due to increase in user base. We came across many blogs and articles which mention that the BatchStatements are mistakenly used for storing multiple partition.
References:
- https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useBatchGoodExample.html
- What is the batch limit in Cassandra?
- Cassandra Batch statement-Multiple tables
- https://grokbase.com/t/cassandra/user/153gsmdzs6/writing-to-multiple-tables
The reason for this seems to be large load on co-ordinator nodes and in turn causing latencies. There was an option of increasing write_request_timeout_in_ms in cassandra.yaml to a higher value than default 5 s. We attempted this, but still requests failed. Hence we updated this set up to now use executeAsync. With this, WriteTimeout Exceptions went away completely.
But now the question is - how do we handle atomicity? Below is the code updated to use executeAsync. Is use of executeAsync a right alternative to using batch statements? Is there any way rollbacks can be handled in the exception block?
try {
for (ListenableFuture<ResultSet> futureItem : futureItems) {
futureItem.get();
}
} catch (Exception e) {
// need to handle rollback ?
}