2

I've been working with cassandra for a year and in one of my projects I had to handle data in various lookup tables, to update, insert and delete... all of them were orchestrated at "service" layer. One of my concerns was consistency, I know cassandra gave up that to offer Availability and Partitioning (what could be tuned, but the project required A and P instead of C).

When I said consistency I'm thinking about this scenario:

Keyspace [User] {
  userId,
  email,
  phoneNumber,
  firstName,
  lastName
} Primary Key (userID)

LookupTables:

  • UserByPhoneNumber
  • UserByEmail
  • UserByLastName

based on the architecture we used, when a client calls service.save(User user) it triggers actions on lookupTables, filling data in all of them, given that what if during insert process the insert fails in one of them? Should I keep control of it in my code either? We managed it using BatchStatement, was it the best approach?

Cassandra version: 2x

Community
  • 1
  • 1

1 Answers1

1

First I would like to define consistency. I think you have mixed up the concept of Cassandra Consistency Level Vs Atomicity. I think your concern is about how to keep data consistent among related tables.

Cassandra Tunable Consistency

Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas.
Cassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency in Cassandra. But Cassandra can be tuned with replication factor and consistency level to also meet C.

Cassandra is best suited where strong consistency is not needed. You will get the most up-to-date data eventually.

Now get to the Data Modeling Part. You are in the right path. :)

It is very important to prepare your query before you design your model. There are some possible solutions for this case.

  1. Usage of Cassandra Secondary Index

You could create secondary index on those columns to query and get your desired data. In this case you don't have to manage any lookup tables and the situation of inconsistent data among tables won't arise. But this is not the good solution for this scenario. The reason for this is described in below link:

https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_when_use_index_c.html

It would probably be more efficient to manually maintain the table as a form of an index instead of using the Cassandra built-in index.

Also reads will be slower cause every node has to queried to get required results. As Cassandra writes are much faster, we maintain tables (tables per query if needed) to do the index and serve the queries and also denormalize data to make read faster. But now arise the problem of maintaining data consistency among those tables. If an update happens, how to ensure to keep indexed/denormalized data consistent in all tables.

  1. Using Batch operation

To maintain data consistency between these tables (depends on use case) if you want to ensure atomicity among these updates batch is the solution.

If your system (cluster health )is okay, Cassandra ensures all writes to be successful. But if in case any write fails (you cannot find user by their email/mobile is okay), then you may avoid the batch (coordinator needs to do a lot of work for maintaining a batch). But here you can use batch.

Additionally if you are using Cassandra 3.0 you can use the materialized view concept where Cassandra maintains data consistency between tables.

There are so many questions related this

How to ensure data consistency in Cassandra on different tables?

Community
  • 1
  • 1
Chaity
  • 1,348
  • 13
  • 20