0

Given table:

CREATE TABLE T (
      a int,
      last_modification_time timestamp,
      b int,
     PRIMARY KEY (a)
);

I'm frequently updating records. With each update last_modification_time is set to now() and also other fields are set.

What is the right cassandra approach to be able to query by last_modification_time range? I need to query like this:

  • select * from .. where a=Z and last_modification_time < X and last_modification_time > Y;

One way would be to create materialized view with PRIMARY KEY (a, last_modification_time) but I want to avoid this since materialized views are buggy in 3.X cassandra versions.

What would be alternative way of querying by last_modification_time range given last_modification_time is frequently updated?

Michal
  • 2,078
  • 19
  • 29
  • your schema will result in wide rows... if you cannot change data model and will be always using partition key try secondary index... https://stackoverflow.com/questions/35719985/range-query-on-secondary-index-in-cassandra – undefined_variable Oct 05 '17 at 06:03
  • @undefined_variable could you clarify what do you mean by 'wide row'? – Michal Oct 05 '17 at 13:04
  • Your best alternative is probably to use Postgres to support a query model like this. – Aaron Oct 05 '17 at 13:06
  • In this case last_modification_time is updated around 20 times then record becomes stable. Given this constraint do you see any data model supporting range queries on last_modification_time? – Michal Oct 05 '17 at 13:15
  • @undefined_variable secondary indexes wont work - range queries are not allowed on indexed columns – Michal Oct 05 '17 at 15:01
  • Another approach would be to use last_modification_time_one_hour_bucket instead of last_modification_time, then delete record if last_modification_time fits into new bucket and insert a record with new bucket – Michal Oct 05 '17 at 20:26
  • @MichalWegorek Wide row was my mistake... But according to link provided cassandra 3.x does provide inequality clause on secondary index (Not tested myself) – undefined_variable Oct 06 '17 at 05:58
  • @undefined_variable Inequality is not enough to do range queries. I tested 3.9 cassandra, range queries on indexed columns and not supported – Michal Oct 06 '17 at 14:26

2 Answers2

0

How about having two tables? One could hold the current snapshot where you're updating the last_modification_time field and another one which holds the changes over time (something like a history table)? You could write to both of them using BATCH statements.

CREATE TABLE t_modifications (
      a int,
      last_modification_time timestamp,
      b int,
     PRIMARY KEY (a, last_modification_time)
) WITH CLUSTERING ORDER BY (last_modificaton_time DESC);

BEGIN BATCH
      UPDATE T SET last_modification_time = 123, b = 4 WHERE a = 2;
      INSERT INTO t_modifications (a, last_modification_time, b) values (2, 123, 4);
APPLY BATCH;

If you're interested on the latest snapshot by a given modification range, you can select and limit the t_modifications table:

SELECT * FROM t_modifications WHERE a = 2 AND last_modification_time < 136 LIMIT 1;
Oresztesz
  • 2,294
  • 1
  • 15
  • 26
0

In general, to do range queries like this, the field you want to range on has to be part of the composite key, has to be the right-most element of the composite key, and all other elements in the composite key have to be specified. In your case, you would modify your PRIMARY KEY to (a, last_modification_time). You can then

SELECT * from t_modifications 
 WHERE a = aval
   AND last_modification_time > beg
   AND last_modification_time < end;

This will get you all records for aval between beg and end.

Wexxor
  • 1,919
  • 1
  • 15
  • 17