How to execute python with sql query pipeline in parallel using concurrent.futures or multiprocessing?

Question

I have built a pipeline where I am fetching data from MySQL database in batches in every iteration until I go through entire dataset.

offset = 0
while True:
    
   await cursor.execute("select * from candidate limit 100 offset '{}'".format(offset))
   data = await cursor.fetchall()

   if len(data) == 0:
       break # break until there is nothing to return from candidate table

   # perform some operations on this data
   # processed data is written to NoSQL database

   # increment offset for next batch
   offset += 1

Currently, this operation is sequential, that mean each batch is processed one by one, and it's causing some issue with latency. Can anyone help me with parallelizing this.

How can I execute three to four sets in parallel and break once the entire table data is processed? Please provide some code examples (or) pseudocode to understand the logic, so I can do it properly.

Well, I had a similar question recently https://stackoverflow.com/q/67958976/4687565 . Wrap data acquisition into a generator and any answer fits. Or any answer to linked questions. I'm not very familiar with `await` but I'm pretty sure they are not needed here. — Dimitry, Jun 29 '21 at 17:31
@Dimitry The problem here, how can I control `limit` in query and stop once entire data from table is processed. — user_12, Jun 30 '21 at 01:15
Well, isn't that an entirely different question. Doesn't documentation answer it? `cursor.execute("select * from candidate limit ? offset ?', (limit_val, offset_val))` And I guess it'll start returning empty tables once offset is beyond table size. — Dimitry, Jun 30 '21 at 09:40

How to execute python with sql query pipeline in parallel using concurrent.futures or multiprocessing?

0 Answers0