1

I am new to cassandra and need help in data modelling the below scenario. In oracle i have two tables similar to below, which i am thinking of migrating to cassandra.

Usecases to solve: 1. Heavy reads and writes. 2. User data is create first, and we can keep adding or removing accounts to it. 3. Also will be having partial updates to one of the user account like updating the amount or some details of account. 4. User_data has a field to store the count of active user_accounts present. So whenever we add or delete an entry/row in user_account table. it will trigger an update in user_data.

Basically i am not clear how to model these scenarios. Whether to have one single table. But with this, i am not sure of number of the accounts If i have single table and have user_accounts as one of the column with type json. Then, i believe i can't do partial update with in that json.

Main problem in thinking of two table is transaction management. If i was able to add to user_account, but unable to update the user_data then it will be a failure.

create table USER_DATA ( userId uuid PRIMARY KEY, name varchar, noOfAccounts int,

..... #Some more columns ...,

);

create table USER_ACCOUNTS ( userId uuid accountId uuid ,
amt int, ..... #Some more columns ...,

PRIMARY KEY (uuid,accountId) );

I tried using list of FROZEN USER_ACCOUNTS, but with this we need to read entire list and write back during every addition/deletion or updation to one of its entry.

I tried using json type, but of no use.

  • Cassandra is usually modeled by query use-case. All appropriately modeled identities use the partition key during the query. So you need to ensure the table you build has a partition key that is queried. If you query the table multiple ways (sometimes filtering on COLX and sometimes using COLY), you would design two tables, one with COLX as the partition key and one with COLY as the partition key. From a relational stance, it isn't pretty, but that's how you achieve speed. Partition keys also determine data distribution. I believe json fields must be updated entirely, as you have stated. – Jim Wartnick May 03 '19 at 18:33

1 Answers1

1

Let me highlight on important point before proceed: Are you sure you need NoSQL and exact Cassandra to store users and accounts?

The Cassandra is designed for large-scale distributed data and it is optimized for very fast writes. If you are still thinking about choosing a solution, I would recommend to invest some amount of time for investigation of existed solutions, and cases where they are effective/no-effective. There are a lot of articles across internet. E.g. https://www.infoworld.com/article/3268871/how-to-choose-the-right-type-of-database-for-your-enterprise.html

Cassandra.

Important questions before choosing the structure:

  • How often user adds new account and deletes existed?
  • How many users make that concurrently?
  • How many accounts typical user has?

Highly suggested to write performance tests for all of your use cases and check the performance before making any decisions.

Original storage structure is okay to start playing with performance tests but with few improvements:

create table users.user_data (user_id uuid PRIMARY KEY, 
              name varchar, 
              account_count counter, 
              some_other_column varchar);

create table users.user_account (user_id uuid account_id uuid , amt int, 
PRIMARY KEY (user_id, account_id));
  • Type of users.user_data.account_count field is counter
  • Both tables are stored within keyspace users. Keyspace configuration is important for performance.

In case when we add new account for user or delete existed account we should make transaction affecting several tables.

Maybe the most obvious thing appear is multi partition batches but that is slow. Carefully to read: Doc

Suggested to try asynchronous writes using driver code. Select Cassandra DataStax driver for your programming language. Here is an abstract example based on Java code to understand the idea:

session.executeAsync("insert into users.user_account ...");
Futures.addCallback(future,
    new FutureCallback<ResultSet>() {
        @Override public void onSuccess(ResultSet result) {
            // Run query for incrementing counter in users.user_data table
        }
        @Override public void onFailure(Throwable t) {}
    },
    MoreExecutors.sameThreadExecutor() );

Update(14 May 2019):

Alternative solution to play with: single table and static columns Have a look at https://blog.ippon.tech/modeling-data-with-cassandra-what-cql-hides-away-from-you/

It seems static columns could help you!

create table users.user_data (user_id uuid PRIMARY KEY, 
              name varchar static, 
              account_count counter static, 
              some_other_column varchar static,
              account_id uuid, 
              amt int, 
              PRIMARY KEY (user_id, account_id));
  • Columns which were originally not belong to user_account table are marked as static
  • Static columns are only stored once internally
  • user_id is partition key and account_id is clustering key. Explanation

Counter column could be static according to Is this type of counter table definition valid?

Yurii Bratchuk
  • 920
  • 9
  • 12
  • Appreciate your comments Yurii. – durga prasad May 12 '19 at 19:46
  • Came up with new table structure to store all the data in one row. Please share your valuable thoughts. create table USER_DATA ( userId uuid PRIMARY KEY, name text, noOfAccounts int, user_accounts Map, // key is the userAccountId(UUID) and value is the json payload in text format. user_group_id ); Partial updates of user_account data. Like updating one specific field in one of the user accounts. Currently i am unable to achieve this and just overriding the entire user_account(only one account though) Fetch can happen either using userId or userGroupId – durga prasad May 12 '19 at 19:47
  • There is simpler solution possible. Please check the update in answer above. Do not forget about performance tests and consider to mark the answer as accepted, if it helped to solve your issue. Thank you! – Yurii Bratchuk May 14 '19 at 20:22