Cassandra CLUSTERING ORDER BY is not working and showing in correct results

Question

Hi I have created a table for storing data of like this

CREATE TABLE keyspace.test (
name text,
date text,
time double,
entry text,
details text,
PRIMARY KEY ((name, date), time)
) WITH CLUSTERING ORDER BY (time DESC);

And inserted data into the table.But a query like this gives an unordered result.

SELECT * FROM keyspace.test where device_id   name ='anand' and date in ('2017-04-01','2017-04-02','2017-04-03','2017-04-05') ;

Is there any problem with my table design.

Not to self-promote or anything, but I wrote an article on this topic back in 2015 that might help: http://www.datastax.com/dev/blog/we-shall-have-order — Aaron, Apr 08 '17 at 21:12
@Aaron I had read your blog. Is there any alternative to save my issue ? — Anand Tagore, Apr 09 '17 at 16:07

Ashraful Islam · Answer 1 · 2017-04-08T19:14:02.113

I think you are misunderstanding cassandra clustering key order. Cassandra Sort data with cluster key within a single partition.

That is for your case cassandra sort data with clustering key time within a single name and date.

Example : Let's insert some data

INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 1, 'a');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 2, 'b');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 3, 'c');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-02', 0, 'nil');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-02', 4, 'd');

If we select data with your query :

SELECT * FROM test where name ='anand' and date in ('2017-04-01','2017-04-02','2017-04-03','2017-04-05') ;

Output :

 name  | date       | time | details | entry
-------+------------+------+---------+-------
 anand | 2017-04-01 |    3 |    null |     c
 anand | 2017-04-01 |    2 |    null |     b
 anand | 2017-04-01 |    1 |    null |     a
 anand | 2017-04-02 |    4 |    null |     d
 anand | 2017-04-02 |    0 |    null |   nil

You can see that time 3,2,1 are within a single partition anand:2017-04-01 are sorted in desc And time 4,0 are within single partition anand:2017-04-02 are sorted in desc. Cassandra will not take care of sorting between different partition.

Here is the doc :

In the table definition, a clustering column is a column that is part of the compound primary key definition, but not the first column, which is the position reserved for the partition key. Columns are clustered in multiple rows within a single partition. The clustering order is determined by the position of columns in the compound primary key definition.

Source : http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html

By the way why is your data field is text type and time field is double type ?
You can use date field as date type and time as timestamp type.

the fields are designed as per the use cases. I have some questions 1.Do I need to change my table design in order to get result ? 2. Is there any performance issues when we query data with paging off ? — Anand Tagore, Apr 09 '17 at 16:11
Then you should not use in query. Use executeAsync (You can control how many async query you want to execute one at a time) or change your data model. — Ashraful Islam, Apr 09 '17 at 16:27
Execute async execute query parallely. So Too many executeAsync at a time can load pressure on your cluster. Check this answer http://stackoverflow.com/a/30526719/2320144 — Ashraful Islam, Apr 09 '17 at 16:36

score 2 · Answer 2 · answered Apr 08 '17 at 21:04

2

The query that you are using is o.k. but it probably doesn't behave as you are expecting it to because coordinator will not sort the results based on partitions. I also run into this problem couple of times.

The solution to it is very simple, basically It's far better to execute the 4 separate queries that you need on the client and then merge the results there. In short IN operator puts a lot of pressure to the coordinator node in the cluster, there's a nice read on this subject:

https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

answered Apr 08 '17 at 21:04

Marko Švaljek

2,071
1
14
26

It is very hard to run seperate queries. – Anand Tagore Apr 09 '17 at 16:13
1

Should not be that hard unless you are using some exotic framework, even then. If you are using futures you can easily chain them. Generally you would just iterate over the params that you would have in `IN`. Just as a side story back in the day when I was starting out If I complained to my mentor about the loops he would just say to me that one is a special case of many :) – Marko Švaljek Apr 09 '17 at 19:21
@MarkoŠvaljek well, if you need sorted and paginated data then client side sorting is not an option. – walv Jul 19 '19 at 12:22

Cassandra CLUSTERING ORDER BY is not working and showing in correct results

2 Answers2