I was searching for pagination in cassandra and found this perfect topic here: Results pagination in Cassandra (CQL) , with this answer accepted by majority of people. But I want to do same thing on multiple computers. I'll provide an example...
The problem
Lets say I have three computers that are connected to same cassandra DB. Each computer wants to take a few rows from the following table:
CREATE TABLE IF NOT EXISTS lp_webmap.page (
domain_name1st text,
domain_name2nd text,
domain_name3rd text,
location text,
title text,
rank float,
updated timestamp,
PRIMARY KEY (
(domain_name1st, domain_name2nd, domain_name3rd), location
)
);
Every computer takes few rows and performs time consuming calculations for them. For a fixed partition key (domain_name1st, domain_name2nd, domain_name3rd) and different clustering key (location), there can be still thousands of results.
And now the problem comes...how to lock quickly a couple of rows with that computer1 is working for other computers?
Unusable solution
In a standard SQL I would use something like this:
CREATE TABLE IF NOT EXISTS lp_registry.page_lock (
domain_name1st text,
domain_name2nd text,
domain_name3rd text,
page_from int,
page_count int,
locked timestamp,
PRIMARY KEY (
(domain_name1st, domain_name2nd, domain_name3rd), locked, page_from
)
) WITH CLUSTERING ORDER BY (locked DESC);
This would allow me to do following:
- Select first 10 pages on computer 1 and lock them (page_from=1, page_count=10)
- Check locks quickly on other two machines and get unused pages for calculations
- Take and lock bigger amount of pages on faster computers
- Delete all locks for given partition key after all pages are processed
Question
However, I can't do LIMIT 20,10 in Cassandra and also I can't do this, since I want to paginate on different computers. Is there any chance how can I paginate through these pages quickly?