5

I want to execute web scraping with a set of categories, and each category also has a list of URLs. So I decided to call a function based only on each category in the main function, and within the inner function there is a non-blocking call.

So here is the code:

def main():
    loop = asyncio.get_event_loop()
    b = loop.create_task(f("p", all_p_list))
    f = loop.create_task(f("f", all_f_list))

    loop.run_until_complete(asyncio.gather(p, f))

It should execute the f function concurrently.

But the f function also has to run the loop, since in the function it calls a function simultaneously, based on each URL.

async def f(category, total): 
    urls = [urls_template[category].format(t) for t in t_list]
    soups_coro = map(parseURL_async, urls)

    loop = asyncio.get_event_loop()
    result = await loop.run_until_complete(asyncio.gather(*soups_coro))

But after I run the script, it got an This event loop is already running error, and I found that it is because I call loop.run_until_complete() in both inner and outer functions.

However, when I strip the run_until_complete(), and just call f() in the main(), the function call immediately got finished and it cannot wait for the inner function to finish. So it is inevitable to call the loop in the main(). But then I think it is incompatible with the inner function, which also must call it.

How can I deal with the problem and run the loop? The orinigal code is all in the same main() and it worked, but I want to make it cleaner if possible.

Blaszard
  • 30,954
  • 51
  • 153
  • 233

2 Answers2

4

How can I deal with the problem and run the loop?

The loop is already running. You don't need to (and can't) run it again.

result = await loop.run_until_complete(asyncio.gather(*soups_coro))

You're awaiting the wrong thing. loop.run_until_complete doesn't return something you can await (a Future); it returns the result of whatever you're running until completion.

The reason nothing appears to happen when you call f directly is that f is an asyncio-style coroutine. As such it returns a future that must be scheduled with the event loop. It doesn't execute until a running event loop tells it to. loop.run_until_complete takes care of all of that for you.

To wrap up your question, you want to await asyncio.gather.

async def f(category, total): 
    urls = [urls_template[category].format(t) for t in t_list]
    soups_coro = map(parseURL_async, urls)

    result = await asyncio.gather(*soups_coro)

And you probably also want to include return result at the end of f, too.

dirn
  • 19,454
  • 5
  • 69
  • 74
2

Convert main() into async function and execute it by loop.run_until_complete().

When the code has the only one run_until_complete() -- everything becomes much easier. In Python 3.7 you will be able to write just asyncio.run(main())

Andrew Svetlov
  • 16,730
  • 8
  • 66
  • 69