Cassandra performance using IN clause on clustering keys

Question

Let's consider the following table

CREATE TABLE base_table(
    partition_key uuid,
    clustering_key1 uuid,
    clustering_key2 uuid,
    regular text,
    PRIMARY KEY((partition_key), clustering_key1, clustering_key2)
);

Prior to Cassandra 2.2, it was not possible to do queries like this :

SELECT * FROM base_table 
WHERE partition_key=<UUID1> 
AND clustering_key1 IN (<UUID2>,<UUID3>) 
AND clustering_key2 < UUID4

Indeed, a clustering key could be restricted only if the preceding one was restricted by an equal relation.

Since Cassandra 2.2, it is possible but does somebody know if there are some caveats doing it ? What performance can be expected, same as if there was no IN clause (or close to) ? Does it scale like an equal relation ?

More, Cassandra 3.X new storage engine may have taken into account optimizing such requests... if anybody has ideas on this :)

Thanks !

score 3 · Answer 1 · answered Mar 16 '18 at 16:44

3

Because you're reading from the same partition, there shouldn't be very big performance impact until you have many elements in the IN relation... But there could be problems if you select too many entries by your < comparison (it could be the problem with single = as well).

answered Mar 16 '18 at 16:44

Alex Ott

80,552
8
87
132

1

Thanks for the reply. Yes indeed being on the same partition is a good start if looking for performance. I will not select large dataset, 10 rows max using the limit clause. But after reading this article http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html an other resources on this blog, i realize that a query cost also depends on how many SSTables will be read. Using TRACING ON is helpfull to understand what is done. – Elendil Mar 19 '18 at 14:07
Yes, that's very useful... If you have much more reads than writes, then leveled compaction strategy may provide better performance: https://www.datastax.com/dev/blog/when-to-use-leveled-compaction – Alex Ott Mar 19 '18 at 14:59

Cassandra performance using IN clause on clustering keys

1 Answers1