0

I have built a pipeline where I am fetching data from MySQL database in batches in every iteration until I go through entire dataset.

offset = 0
while True:
    
   await cursor.execute("select * from candidate limit 100 offset '{}'".format(offset))
   data = await cursor.fetchall()

   if len(data) == 0:
       break # break until there is nothing to return from candidate table

   # perform some operations on this data
   # processed data is written to NoSQL database

   # increment offset for next batch
   offset += 1

Currently, this operation is sequential, that mean each batch is processed one by one, and it's causing some issue with latency. Can anyone help me with parallelizing this.

How can I execute three to four sets in parallel and break once the entire table data is processed? Please provide some code examples (or) pseudocode to understand the logic, so I can do it properly.

user_12
  • 1,778
  • 7
  • 31
  • 72
  • Well, I had a similar question recently https://stackoverflow.com/q/67958976/4687565 . Wrap data acquisition into a generator and any answer fits. Or any answer to linked questions. I'm not very familiar with `await` but I'm pretty sure they are not needed here. – Dimitry Jun 29 '21 at 17:31
  • @Dimitry The problem here, how can I control `limit` in query and stop once entire data from table is processed. – user_12 Jun 30 '21 at 01:15
  • Well, isn't that an entirely different question. Doesn't documentation answer it? `cursor.execute("select * from candidate limit ? offset ?', (limit_val, offset_val))` And I guess it'll start returning empty tables once offset is beyond table size. – Dimitry Jun 30 '21 at 09:40

0 Answers0