Cassandra Python driver doesn't page large queries

Question

It is said in the documentation that cassandra-driver does automatic paging when queries are large enough (with default_fetch_size being 5000 rows) and will return PagedResult.

I have tested reading data from my local Cassandra which contains 9999 rows with SimpleStatement with my own fetch size, but it returned the ResultSet (9999 rows) instead of pages (instance of PagedResult). Also, I tried to change the Session.default_fetch_size but it didn't work as well.

Here's my code..

My first attempt: This is the SimpleStatement code i have made to change the fetch size.

cluster = Cluster()
session = cluster.connect(keyspace_name)

query = "SELECT * FROM user"
statement = SimpleStatement(query, fetch_size=10)
rows = list(session.execute(statement))
print(len(rows))

It prints 9999 (all rows), not 10 rows as I already set the fetch_size.

My second attempt: I tried to change the query fetch size by changing session's default fetch size Session.default_fetch_size.

cluster = Cluster()
session = cluster.connect(keyspace_name)
session.default_fetch_size = 10

query = "SELECT * FROM user"
rows = list(session.execute(query))
print(len(rows))

It also prints 9999 rows instead of 10.

My goal is not to limit the rows from my fetch query, such as SELECT * FROM user LIMIT 10. What I want is to fetch the rows page by page to avoid overload on memory.

So what actually happened?

Note: I am using Cassandra-Driver 3.25 for Python and using Python3.7

I am sorry if my additional information still doesn't make my question a good one. I never ask any questions before. So...any suggestions are welcome :)

Will you please add a minimal sample code that replicates the issue but versions you're using? Here's a friendly note on how to ask good questions. The general guidance is that you (a) provide a good summary of the problem that includes software/component versions, the full error message + full stack trace; (b) describe what you've tried to fix the problem, details of investigation you've done; and (c) minimal sample code that replicates the problem. Cheers! — Erick Ramirez, Aug 05 '22 at 09:40
Thank you for the suggestions, I am adding details to my question — Rafi Adyatma, Aug 09 '22 at 16:34

score 0 · Accepted Answer · answered Aug 10 '22 at 07:21

0

Your test is invalid because your code is faulty.

When you list(), you are in fact "materialising" all the result pages. Your code is not iterating over the rows but retrieving all of the rows.

The driver automatically fetches the next page in the background until there are no more pages to fetch. It may not seem like it but each page only contains fetch_size rows.

Retrieving the next page happens transparently so to you it seems like the results are not getting paged at all but that automatic behaviour from the driver is working as designed. Cheers!

answered Aug 10 '22 at 07:21

Erick Ramirez

13,964
1
18
23

I see, but i want to confirm what i have understood. Let's say i do iterate the rows like `for row in session.execute(statement)` for my first attempt (not using `list()` anymore), it iterates through the first 10 rows. Since the total rows are 9999, it does fetch the next 10 rows from Cassandra, the process repeats until every row has successfully iterated? Am i right? – Rafi Adyatma Aug 10 '22 at 08:22
Correct, yes. The driver fetches the next page without you knowing about it, hence "transparent". It will keep fetching the next page if you keep going until there are no more pages left to fetch. Cheers! – Erick Ramirez Aug 10 '22 at 09:22
ahh i forgot to say this, Thank you! – Rafi Adyatma Aug 11 '22 at 03:52

Cassandra Python driver doesn't page large queries

1 Answers1