0

I have below Cassandra table.

 create table person( 
   id int PRIMARY KEY,  
   name text, 
   imp_dates map<text,timestamp>
 );

Data inserted as below

insert into person(id,name,imp_dates) values(1,'one',{'birth':'1982-04-01','marriage':'2018-04-01'});
insert into person(id,name,imp_dates) values(2,'two',{'birth':'1980-04-01','marriage':'2010-04-01'});
insert into person(id,name,imp_dates) values(3,'three',{'birth':'1980-04-01','graduation':'2012-04-01'});

 id | name  | imp_dates
----+-------+-----------------------------------------------------------------------------------------------
  1 |   one |   {'birth': '1982-03-31 18:30:00.000000+0000', 'marriage': '2018-03-31 18:30:00.000000+0000'}
  2 |   two |   {'birth': '1980-03-31 18:30:00.000000+0000', 'marriage': '2010-03-31 18:30:00.000000+0000'}
  3 | three | {'birth': '1980-03-31 18:30:00.000000+0000', 'graduation': '2012-03-31 18:30:00.000000+0000'}

I have requirement to write query as below. This required range on map value column.

 select id,name,imp_dates from person where id =1 and imp_dates['birth'] < '2000-04-01';

I get following error

 Error from server: code=2200 [Invalid query] message="Only EQ relations are supported on map entries"

The possible solution I can think of is:

1) Make map flat into multiple columns and then make it part of primary key. this will work but its not flexible since I may have to alter the schema

2) I can create another table person_id_by_important_dates to replace Map but then I loose read consistency as I have to read from two tables and join myself.

I do not wish to include imp_dates (map) part of primary key as it will create new row every time I insert with new values.

Appreciate help with this. Thanks

Himanshu Ahire
  • 677
  • 5
  • 18
  • The second solution seems to be the best. It is in line with how Cassandra works. You will have to pay attention to the primary key in the second table - it must have the person id and the important_date name, otherwise you will overwrite it each time you add an important date for that person. Cassandra works with denormalization and this is what you are doing here. You can also read this - [selecting on two tables in Cassandra](https://stackoverflow.com/questions/33981408/selecting-on-two-tables-in-cassandra) – Horia Jan 15 '19 at 19:05
  • Thanks @Horia yes second solutions seems more realistic one for production use. I am not very keen in creating Cartesian product as suggested by link because in reality I will be storing blobs and duplicate data will be expensive for millions of rows. – Himanshu Ahire Jan 15 '19 at 20:10
  • Would it be an option to store the blobs outside Cassandra? And store in Cassandra only the reference to the blob – Horia Jan 16 '19 at 07:36
  • I would be storing it as Json String. If its not Json I would have used any RDBMS solution since meta data wont take much space. I decided to make it de-normalize unusual way. i.e. will be converting list to columns as I have few number of imp_dates on which need to do range query . – Himanshu Ahire Jan 17 '19 at 21:06

0 Answers0