Cassanda cql issue : "Batch too large","code":8704

Question

I am getting the below error in select query.

{"error":{"name":"ResponseError","info":"Represents an error message from the server","message":"Batch too large","code":8704,"coordinator":"10.29.96.106:9042"}}

What does the select query look like? How many SELECTs are in the batch? — Aaron, Aug 08 '22 at 11:59
select * from user_request_by_country_by_processworkflow WHERE created_on <= '2022-01-08T16:19:07+05:30' ALLOW FILTERING; — Sumant, Aug 08 '22 at 12:47

score 1 · Answer 1 · answered Aug 08 '22 at 14:16

Ahh, I get it; you're using Dev Center.

If result is more than 1000 it is showing this error

Yes, that's Dev Center preventing you from running queries that can hurt your cluster. Like this:

select * from user_request_by_country_by_processworkflow
WHERE created_on <= '2022-01-08T16:19:07+05:30' ALLOW FILTERING;

ALLOW FILTERING is a way to force Cassandra to read multiple partitions in one query, even though it is designed to warn you against doing that. If you really need to run a query like this, then you'll want to build a table with a PRIMARY KEY designed to specifically support that.

In this case, I'd recommend "bucketing" your table data by whichever time component keeps the partitions within a reasonable size. For example, if the day keeps the rows-per-partition below 50k, the primary key definition would look like this:

PRIMARY KEY (day,created_on)
WITH CLUSTERING ORDER BY (created_on DESC);

Then, a query that would work and be allowed would look like this:

SELECT * FROM user_request_by_country_by_processworkflow
WHERE day=20220108
  AND created_on <= '2022-01-08T16:19:07+05:30';

In summary:

Don't run multi-partition queries.
Don't use ALLOW FILTERING.
Do build tables to match queries.
Do use time buckets to keep partitions from growing unbounded.

Problem here is i can't change it because it is existing table and used so many places. So do u have any workaround or is it possible to use limit option with allow filter as workaround — Sumant, Aug 08 '22 at 14:44
@Sumant You could try using Spark. A distributed query layer on top of Cassandra is one way to run OLAP based queries like this. — Aaron, Aug 08 '22 at 15:42

Cassanda cql issue : "Batch too large","code":8704

1 Answers1