Scylladb : clustering key cartesian product size 600 is greater than maximum 100

Question

I am using data stax java driver to query scylladb , i see this error while reading data from scylla RequestHandler: ip:9042 replied with server error (clustering key cartesian product size 600 is greater than maximum 100), defuncting connection.

score 13 · Accepted Answer · answered Jan 27 '20 at 21:36

13

This error is returned in order to prevent too large restriction sets from being generated, which may put a strain on your server. If you're aware of the risks and know a reasonable upper bound of the number of restrictions for your queries, you can manually change the maximum in scylla.yaml, e.g. max_clustering_key_restrictions_per_query: 650. Note however, that this option has a warning in its description and it should be acknowledged:

Maximum number of distinct clustering key restrictions per query.
This limit places a bound on the size of IN tuples, especially when multiple
clustering key columns have IN restrictions. Increasing this value can result
in server instability.

In particular, setting this flag above a couple of hundred is risky - 600 should be alright, but at this point you could also consider rephrasing your query, so that they have less values in their IN restrictions - perhaps splitting some queries into multiple smaller ones?

Source from Scylla tracker: https://github.com/scylladb/scylla/pull/4797

answered Jan 27 '20 at 21:36

Piotr Sarna

556
3
5

Do we have some flag to enable or disable this constraint?? – ROHAN VADJE Jan 28 '20 at 03:17
After config change i am getting this error => partition key cartesian product size 200 is greater than maximum 100 . Should we have max_partition_key_restrictions_per_query like property for reading partition key. – ROHAN VADJE Jan 28 '20 at 04:24
3

@ROHANVADJE there is another one for partition keys: `max_partition_key_restrictions_per_query` (https://github.com/scylladb/scylla/blob/12bc965f713336792ee5d32609b3885ff09c25aa/db/config.cc#L722) It is 2 options that may be configured: https://github.com/scylladb/scylla/pull/4797/files#diff-4353a8d64e8ae43313db47c65cca1c69R689-R695 – Ivan Prisyazhnyy Jan 28 '20 at 07:01
thank you ,Its working now, But my reads are very slow compare to cassandra . Do you know any configs that I need to tweak to get better read performance. – ROHAN VADJE Jan 28 '20 at 08:39
@ROHANVADJE there was an answer to the similar question here https://stackoverflow.com/a/59687819/6906571 . Share what you see, what you do and how you config to allow giving you any suggestions. Generally, there must be no use cases under which Scylla perform worse. – Ivan Prisyazhnyy Jan 28 '20 at 09:06
@IvanPrisyazhnyy i am executing scylla inside docker(image provided by scylladb ) . Content of /etc/scylla.d/cpuset.conf ==># DO NO EDIT # This file should be automatically configure by scylla_cpuset_setup # # CPUSET="--cpuset 0 --smp 1" looks like i am using single core ?? right ?? i am using machine with 32 cores. – ROHAN VADJE Jan 28 '20 at 09:27
@ROHANVADJE yes) with CPUSET="--cpuset 0 --smp 1" you are restricting ScyllaDB to use first core only on your CPU. Some of those options are described here: https://hub.docker.com/r/scylladb/scylla/ – Ivan Prisyazhnyy Jan 29 '20 at 08:35
@ROHANVADJE - In addition to Ivan's suggested link, there is documentation for best practices on using Docker with Scylla. These are posted at https://docs.scylladb.com/operating-scylla/procedures/tips/best_practices_scylla_on_docker/ It would be expected that Scylla would scale linearly with the number of cores it is given. Unless you hit another bottleneck like I/O throughput. But you should certainly see throughput scale as you increase the core count. – Greg Jan 30 '20 at 06:51

score -2 · Answer 2 · answered Jan 29 '20 at 13:35

it depends on the data shape and concurrency. If your rows are large and the concurrency is high, it is easy to cause scylla to run out of memory. If your rows are small and/or the concurrency is low, everything will be fine. It's okay to increase the parameter value, just be aware you're on dangerous ground and you should try to reduce your IN query cartesian product sizes.

max value can be set as 1000000000.

Scylladb : clustering key cartesian product size 600 is greater than maximum 100

2 Answers2