20

Very new to Cassandra so apologies if the question is simple.

I created a table:

create table ApiLog (
LogId uuid,     
DateCreated timestamp,
ClientIpAddress varchar,
primary key (LogId, DateCreated));

This work fine:

select * from apilog

If I try to add a where clause with the DateCreated like this:

select * from apilog where datecreated <= '2016-07-14'

I get this:

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

From other questions here on SO and from the tutorials on datastax it is my understanding that since the datecreated column is a clustering key it can be used to filter data.

I also tried to create an index but I get the same message back. And I tried to remove the DateCreated from the primary key and have it only as an index and I still get the same back:

create index ApiLog_DateCreated on dotnetdemo.apilog (datecreated);
Jason
  • 507
  • 1
  • 6
  • 16

4 Answers4

23

The partition key LogId determines on which node each partition will be stored. So if you don't specify the partition key, then Cassandra has to filter all the partitions of this table on all the nodes to find matching data. That's why you have to say ALLOW FILTERING, since that operation is very inefficient and is discouraged.

If you specify a specific LogId, then Cassandra can find the partition on a single node and efficiently do a range query by the clustering key.

So you need to plan your schema such that you can do your range queries within a single partition and not have to do a full table scan like you're trying to do.

Jim Meyer
  • 9,275
  • 1
  • 24
  • 49
  • So am I right to assume then that having a PK that contains a uuid as the partition key is always a bad idea because you can never efficiently query the data? – Jason Jul 13 '16 at 11:50
  • 1
    Yes, that's usually a bad idea since you will have partitions that only have one row in them. This would only be useful if your application needed to work on one row at a time. – Jim Meyer Jul 13 '16 at 11:54
  • 6
    I see now. I guess it takes some time to get thinking in a different way when coming from RDBMS, thanks. – Jason Jul 13 '16 at 12:03
2

This command will work:

Select * FROM activity WHERE home_id = 'H01474777' AND code_used = '5599'ALLOW FILTERING;
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Sumit_Keen
  • 31
  • 2
1

When your query is rejected by Cassandra because it needs filtering, you should resist the urge to just add ALLOW FILTERING to it. You should think about your data, your model and what you are trying to do. You always have multiple options.

You can change your data model, add an index, use another table or use ALLOW FILTERING.

You have to make the right choice for your specific use case.

Akash Mulik
  • 315
  • 3
  • 12
1

Anyway you want to make it work.

select * from dev."3" where "column" = '' limit 1000 ALLOW FILTERING;
gamingflexer
  • 250
  • 3
  • 5