Cassandra isn't designed to iterate over the entire data set in a single expensive query like this. If theres 10 petabytes in data
for example this query would require reading 10 petabytes off disk, bring it into memory, stream it to coordinator which will resolve the tombstones/deduplication (you cant just have each replica send a count or you will massively under/over count it) and increment a counter. This is not going to work in a 5 second timeout. You can use aggregation functions over smaller chunks of the data but not in a single query.
If you really want to make this work like this, query the system.size_estimates
table of each node, and for each range split according to the size such that you get an approximate max of say 5k per read. Then issue a count(*) for each with a TOKEN restriction for each of the split ranges and combine value of all those queries. This is how spark connector does its full table scans in the SELECT *
rrds so you just have to replicate that.
Easiest and probably safer and more accurate (but less efficient) is to use spark to just read the entire data set and then count, not using an aggregation function.