how to do the query in cassandra If i have two cluster key in column family

Question

I have a column family and syntax like this:

CREATE TABLE sr_number_callrecord ( 
  id int, 
  callerph text, 
  sr_number text, 
  callid text, 
  start_time text, 
  plan_id int, 
  PRIMARY KEY((sr_number), start_time, callerph) 
);

I want to do the query like :

  a) select * from dummy where sr_number='+919xxxx8383' 
                   and start_time >='2014-12-02 08:23:18' limit 10;

  b)  select * from dummy where sr_number='+919xxxxxx83' 
                          and start_time >='2014-12-02 08:23:18' 
                          and callerph='+9120xxxxxxxx0' limit 10;

First query works fine but second query is giving error like

Bad Request: PRIMARY KEY column "callerph" cannot be restricted 
(preceding column "start_time" is either not restricted or by a non-EQ 
relation)

If I get the result in first query, In second query I am just adding one
more cluster key to get filter result and the row will be less

score 6 · Accepted Answer · answered Aug 19 '15 at 11:20

6

Just like you cannot skip PRIMARY KEY components, you may only use a non-equals operator on the last component that you query (which is why your 1st query works).

If you do need to serve both of the queries you have listed above, then you will need to have separate query tables for each. To serve the second query, a query table (with the same columns) will work if you define it with a PRIMARY KEY like this:

PRIMARY KEY((sr_number), callerph, start_time)

That way you are still specifying the parts of your PRIMARY KEY in order, and your non-equals condition is on the last PRIMARY KEY component.

answered Aug 19 '15 at 11:20

Aaron

55,518
11
116
132

Thanks for the info. I am going through cassandra 2.2 document what I understand that it is possible to do the above query. I am not sure it is possible or not. I am using cassandra 2.0.11 . Do you have any idea about it?? – Birendra Kumar Aug 19 '15 at 11:48
@BirendraKumar 2.2 uses the same underlying storage engine as 2.0 does, so this query still won't work in 2.2. Perhaps it may with a secondary index on `start_time`, but I don't recommend that. However 3.0 will have materialized views, which should make query table generation (and ultimately solving problems like this) much easier. – Aaron Aug 19 '15 at 12:28
@BryceAtNetwork23 why don't you recommend using secondary indices? – Haspemulator Aug 19 '15 at 20:22
@Haspemulator Because they are designed for *convenience*, not for *performance*. – Aaron Aug 19 '15 at 20:23
@BryceAtNetwork23 I believe, such tradeoffs should be made by the user himself. Secondary indices are not evil, you just need to understand how they work. Furthermore, they have their own benefits against the solution you propose. – Haspemulator Aug 19 '15 at 20:42
I designed for performance not convenience, I will have millions data in that case it is not recommended to create secondary index on start_time!!!! – Birendra Kumar Aug 20 '15 at 09:38

score 1 · Answer 2 · answered Aug 20 '15 at 02:46

There are certain restrictions in the way the primary key columns are to be used in the where clause http://docs.datastax.com/en/cql/3.1/cql/cql_reference/select_r.html

One solution that will work in your situation is to change the order of clustering columns in the primary key

CREATE TABLE sr_number_callrecord ( 
id int, 
callerph text, 
sr_number text, 
callid text, 
start_time text, 
plan_id int, 
PRIMARY KEY((sr_number),  callerph, start_time,) 
);

Now you can use range query on the last column as

select * from sr_number_callrecord where sr_number = '1234' and callerph  = '+91123' and start_time >= '1234';

how to do the query in cassandra If i have two cluster key in column family

2 Answers2