1

Looking to filter all users who 'held' during a particular date range without using the IN() statement with a list of each date in Cassandra, but the caveat is that there are 3 partition keys in my table. If this is possible using token() function in Cassandra, please share.

Here's the example table:

CREATE TABLE priceOverTime (
  id text,
  date timestamp,
  action text,
  price double,
  PRIMARY KEY ((id, date, action), price)

)

..

Partition keys: id, time, action

Clustering keys: price

Trying to use something similar to this StackOverflow question, but can't get it to return the desired output.

SELECT * FROM priceOverTime WHERE date > '2017-01-01 00:00:00+0000' 
                        AND date <= '2017-01-04 23:00:00+0000' 
                        AND token(id) > previous_token 
                        AND  token(action) = token('hold') 
                        AND LIMIT 100 ALLOW FILTERING;

Example Data

id |   date               | action  | price    
--------------------------+----------
1 | 2017-01-01 00:00:00-0000 | buy | 100
1 | 2017-01-02 00:00:00-0000 | buy | 105
1 | 2017-01-03 00:00:00-0000 | sell | 103
1 | 2017-01-04 00:00:00-0000 | buy  | 102
2 | 2017-01-01 00:00:00-0000 | buy | 100
2 | 2017-01-02 00:00:00-0000 | sell | 105
2 | 2017-01-03 00:00:00-0000 | hold | 103
2 | 2017-01-04 00:00:00-0000 | hold  | 102

Desired Output

2 | 2017-01-03 00:00:00-0000 | hold | 103
2 | 2017-01-04 00:00:00-0000 | hold  | 102

Note: Using Cassandra Version 2.2. Also, my version of Cassandra does suggest using token() function for this type of filtering -- shown in a previous error message.

Community
  • 1
  • 1
eyeOfTheStorm
  • 351
  • 1
  • 5
  • 15

1 Answers1

2

I am going to assume that you are actually using a PRIMARY KEY of (id, date, action) so as to match your column definitions.

If this is a query that you need to satisfy then the best way to accomplish this is to create separate table containing the data that you need. Both the method that you are trying above and the one shown in the referenced post are really sub-optimal and anti-patterns. C* is designed to let you query by the partition key for a reason. Adding ALLOW FILTERING to your query basically makes it return all the data in the table and then filter it out based on your criteria. In a distributed datastore such as C* this scatter gather type operation is extremely costly and will normally timeout with any reasonably large sized data set.

bechbd
  • 6,206
  • 3
  • 28
  • 47
  • Without creating a new table, do you know how to use the primary keys to filter by date range, where user 'holds' for all id's -- using token() ? Changing tables may not be practical. I understand "previous_token" helps get around timeout issue... – eyeOfTheStorm Mar 07 '17 at 01:49
  • You might be able to do this with something like this, but I have not checked it. You need to include the entire partition key in the token function for it to work: SELECT *, token(id, action, date) as previous_token FROM priceOverTime WHERE token(id, action, date) > previous_token LIMIT 100 ALLOW FILTERING; – bechbd Mar 07 '17 at 14:00