2

I am getting the below error in select query.

{"error":{"name":"ResponseError","info":"Represents an error message from the server","message":"Batch too large","code":8704,"coordinator":"10.29.96.106:9042"}}

James Z
  • 12,209
  • 10
  • 24
  • 44
Sumant
  • 21
  • 2

1 Answers1

1

Ahh, I get it; you're using Dev Center.

If result is more than 1000 it is showing this error

Yes, that's Dev Center preventing you from running queries that can hurt your cluster. Like this:

select * from user_request_by_country_by_processworkflow
WHERE created_on <= '2022-01-08T16:19:07+05:30' ALLOW FILTERING;

ALLOW FILTERING is a way to force Cassandra to read multiple partitions in one query, even though it is designed to warn you against doing that. If you really need to run a query like this, then you'll want to build a table with a PRIMARY KEY designed to specifically support that.

In this case, I'd recommend "bucketing" your table data by whichever time component keeps the partitions within a reasonable size. For example, if the day keeps the rows-per-partition below 50k, the primary key definition would look like this:

PRIMARY KEY (day,created_on)
WITH CLUSTERING ORDER BY (created_on DESC);

Then, a query that would work and be allowed would look like this:

SELECT * FROM user_request_by_country_by_processworkflow
WHERE day=20220108
  AND created_on <= '2022-01-08T16:19:07+05:30';

In summary:

  • Don't run multi-partition queries.
  • Don't use ALLOW FILTERING.
  • Do build tables to match queries.
  • Do use time buckets to keep partitions from growing unbounded.
Aaron
  • 55,518
  • 11
  • 116
  • 132
  • Problem here is i can't change it because it is existing table and used so many places. So do u have any workaround or is it possible to use limit option with allow filter as workaround – Sumant Aug 08 '22 at 14:44
  • @Sumant You could try using Spark. A distributed query layer on top of Cassandra is one way to run OLAP based queries like this. – Aaron Aug 08 '22 at 15:42