I have very huge Cassandra table containing over 1 billion records. My primary key forms like this: "(partition_id, cluster_id1, cluster_id2)
". Now for several particular partition_id, I have too many records that I can't run row count on these partition keys without timeout exception raised.
What I ran in cqlsh is:
SELECT count(*) FROM relation WHERE partition_id='some_huge_partition';
I got this exception:
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
I tried to set --connect-timeout
and --request-timeout
, no luck. I counted same data in ElasticSearch, the row count is approximately 30 million (the same partition).
My Cassandra is 3.11.2 and CQLSH is 5.0.1. The Cassandra cluster contains 3 nodes and each has more 1T HDD(fairly old servers, more than 8 years).
So in short, my questions are:
- How can I count it? is it even possible to count a huge partition in Cassandra?
- Can I use COPY TO command with partition key as it's filter, so I can count it in the exported CSV file?
- Is there a way I can monitor the insert process before any partition getting too huge?
Big thanks advanced.