Reading from mysql in batch end up hitting all rows

Asked Jul 24 '19 at 18:57

Active Jul 24 '19 at 18:59

Viewed 531 times

I am trying to read a 100GB+ table in python using pymysql python package. the query I am firing is

select * from table

But I want to be able to process records in chunks instead of hitting the database for 100 GB records, below is my code

    with self.connection.cursor() as cursor:
        logging.info("Executing Read query")
        logging.info(cursor.mogrify(query))
        cursor.execute(query)
        schema = cursor.description
        size = cursor.rowcount
        for i in range((size//batch)+1):
            records = cursor.fetchmany(size=batch)
            yield records, schema

but when the query gets executed at cursor.execute(query) it tried to get those 100GB records and end up killing the process.

Is there any better way to read data in chunk from mysql using python?

edited Jul 24 '19 at 18:59

esqew

42,425
27
92
132

asked Jul 24 '19 at 18:57

Deepak Verma

https://stackoverflow.com/questions/6195988/about-mysql-cursor-and-iterator – waynetech Jul 24 '19 at 18:59
2

Any reason you can't paginate through the data using MySQL's [`LIMIT`](https://dev.mysql.com/doc/refman/8.0/en/select.html#idm139812212181472) functionality? – esqew Jul 24 '19 at 19:05
What I want to understand is how fetchmany executes, does it asks the mysql for these many of row or results are already there and its just does slicing? – Deepak Verma Jul 24 '19 at 19:28

Reading from mysql in batch end up hitting all rows

0 Answers0