0

I need to make an API request for several pieces of data, and then process each result. The request is paginated, so I'm currently doing

def get_results():
    while True:
        response = api(num_results=5)
        if response is None:  # No more results
            break
        yield response

def process_data():
    for page in get_results():
        for result in page:
            do_stuff(result)

process_data()

I'm hoping to use asyncio to retrieve the next page of results from the API while I'm processing the current one, instead of waiting for results, processing them, then waiting again. I've modified the code to

import asyncio

async def get_results():
    while True:
        response = api(num_results=5)
        if response is None:  # No more results
            break
        yield response

async def process_data():
    async for page in get_results():
        for result in page:
            do_stuff(result)

asyncio.run(process_data())

I'm not sure if this is doing what I intend it to. Is this the right way to make processing the current page of API results and getting the next page of results asynchronous?

Piyush
  • 1
  • 1
  • To use asyncio, the API you're calling needs to be async itself. A good indicator is that you must **await** its result. If none of your `async def` functions contain an await, that's a hint that they're not really async. – user4815162342 Nov 23 '19 at 08:28

1 Answers1

0

Maybe you can use Asyncio.Queue to refactor your code to Producer/Consumer Pattern

import asyncio
import random

q = asyncio.Queue()

async def api(num_results):
    # you could use aiohttp to fetch api

    # fake content
    await asyncio.sleep(1)
    fake_response = random.random()
    if fake_response < 0.1:
        return None
    return fake_response

async def get_results(q):
    while True:
        response = await api(num_results=5)
        if response is None:
            # indicate producer done
            print('Producer Done')
            await q.put(None)
            break
        print('Producer: ', response)
        await q.put(response)

async def process_data():
    while True:
        data = await q.get()
        if not data:
            print('Consumer Done')
            break
        # process data whatever you want, but if its cpu intensive, you can call loop.run_in_executor
        # fake the process needs a little time
        await asyncio.sleep(3)
        print('Consume', data)

loop = asyncio.get_event_loop()
loop.create_task(get_results(q))
loop.run_until_complete(process_data())

Come back to the question

Is this the right way to make processing the current page of API results and getting the next page of results asynchronous?

Its not the right way, because get_results() is iterated each time your do_stuff(result) done