0

Before you downvote I would like to state that I looked at all of the similar questions but I am still getting the dreaded "PRIMARY KEY column cannot be restricted" error.

Here's my table structure:

CREATE TABLE IF NOT EXISTS events (
    id text,
    name text,
    start_time timestamp,
    end_time timestamp,
    parameters blob,
    PRIMARY KEY (id, name, start_time, end_time)
);

And here's the query I am trying to execute:

SELECT * FROM events WHERE name = ? AND start_time >= ? AND end_time <= ?;

I am really stuck at this. Can anyone tell me what I am doing wrong?

Thanks, Deniz

rogerdpack
  • 62,887
  • 36
  • 269
  • 388
Deniz Acay
  • 1,609
  • 1
  • 13
  • 24

3 Answers3

1

To run the query, use the ALLOW FILTERING clause,

SELECT * FROM analytics.events WHERE name = ? AND start_time >= ? AND end_time <= ? ALLOW FILTERING;
Kishore
  • 5,761
  • 5
  • 28
  • 53
1

This is a query you need to remodel your data for, or use a distributed analytics platform (like spark). Id describes how your data is distributed through the database. Since it is not specified in this query a full table scan will be required to determine the necessary rows. The Cassandra design team has decided that they would rather you not do a query at all rather than do a query which will not scale.

Basically whenever you see "COLUMN cannot be restricted" It means that the query you have tried to perform cannot be done efficiently on the table you created.

RussS
  • 16,476
  • 1
  • 34
  • 62
  • 1
    Thanks for answer. Although I am using a single mode embedded Cassandra, I decided to use Spark. I wondered if it could be solved and your answer made sense of this behavior. – Deniz Acay Apr 03 '15 at 00:00
0

The "general" rule to make query is you have to pass at least all partition key columns, then you can add each key in the order they're set." So in order for you to make this work you'd need to add where id = x in there.

However, it appears what this error message is implying is that once you select 'start_time > 34' that's as far "down the chain" you're allowed to go otherwise it would require the "potentially too costly" ALLOW FILTERING flag. So it has to be "only equality" down to one < > combo on a single column. All in the name of speed. This works (though doesn't give a range query):

 SELECT * FROM events WHERE name = 'a' AND start_time = 33 and end_time <= 34 and id = '35';

If you're looking for events "happening at minute y" maybe a different data model would be possible, like adding an event for each minute the event is ongoing or what not, or bucketing based on "hour" or what not. See also https://stackoverflow.com/a/48755855/32453

rogerdpack
  • 62,887
  • 36
  • 269
  • 388