0

I have some billions records with 15 fields, which I want to insert them into Cassandra (with Java api). Since my queries search key can be one of the five different fields of record (i.e search query on fields 3 or 7 or 8 or 13 or 14), so I have created 5 identical tables with different primary keys in Cassandra (similar the note that is mentioned in enter link description here).

Now I read a record (or a batch of the records) and call "inserting into Cassandra" 5 times.

I want to know is there a mechanism in Cassandra that makes me to call "inserting into Cassandra" one times and storing the record(s) into 5 tables automatically?

For example the record(s) stores in MemTable at once (from my code by inserting at once) and the Cassandra core stores them in 5 tables in SSTable?

1 Answers1

1

Since Cassandra 3.0 there is support for materialized views that could help you. But you need to design your source table carefully, as there is a number of limitations on how you can change structure of the materialized views comparing to source table - most notably: * you can add to primary key at most one column that isn't in the primary key of source table; * materialized view's primary key should contain all components of primary key of source table, but you can use different order of columns in primary key. * all columns of materialized view's primary key should be non-null.

More details on these limitations you can find in this blog post.

You also need to be careful with changing partition key to not to get the big partitions (but you may have the same problem if you write data manually). Also, take into account that this adds more load to coordinator node that will need to distribute data to other servers if partition key is changed - when you write data "manually" then driver will send request directly to replica that holds that data.

Syntax for creation of materialized views is in the documentation - it quite similar to SQL's but not exactly (example from documentation):

CREATE TABLE cyclist_mv (cid UUID PRIMARY KEY, 
   name text, age int, birthday date, country text);

CREATE MATERIALIZED VIEW cyclist_by_age 
  AS SELECT age, birthday, name, country 
  FROM cyclist_mv 
  WHERE age IS NOT NULL AND cid IS NOT NULL 
  PRIMARY KEY (age, cid);

In this case, we move from one column in primary key (cid) to 2 columns in the primary key (age, and cid). Note the explicit check for non-NULL values in theWHERE` condition.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thank you very much, but in my design as I said in the question, I have some queries on different keys (query just on field "3" or just on field "7" and not on fields "3,7"). I think materialized view may not help me. Also since I have some billions records, so materialized view which is stored in the memory, does not fit inside the ram... – ahmadi morteza ali Dec 28 '18 at 07:22
  • No, Materialized view isn't stored in the memory - it's a table that is simply written by Cassandra, not manually... Regarding the search - even if you have multiple clustering columns, you can still search only by partition key, or only by first N clustering columns... (I recommend to watch DS220 course ad DataStax Academy to get more information about design and queries) – Alex Ott Dec 28 '18 at 08:18
  • Thanks, so what about my keys? Can I have some different keys on materialized view without overlapping on base table keys? – ahmadi morteza ali Dec 29 '18 at 06:45
  • You can have only one additional key that isn’t a part of primary key of base table – Alex Ott Dec 29 '18 at 09:31