I want to execute web scraping with a set of categories, and each category also has a list of URLs. So I decided to call a function based only on each category in the main function, and within the inner function there is a non-blocking call.
So here is the code:
def main():
loop = asyncio.get_event_loop()
b = loop.create_task(f("p", all_p_list))
f = loop.create_task(f("f", all_f_list))
loop.run_until_complete(asyncio.gather(p, f))
It should execute the f
function concurrently.
But the f
function also has to run the loop, since in the function it calls a function simultaneously, based on each URL.
async def f(category, total):
urls = [urls_template[category].format(t) for t in t_list]
soups_coro = map(parseURL_async, urls)
loop = asyncio.get_event_loop()
result = await loop.run_until_complete(asyncio.gather(*soups_coro))
But after I run the script, it got an This event loop is already running
error, and I found that it is because I call loop.run_until_complete()
in both inner and outer functions.
However, when I strip the run_until_complete()
, and just call f()
in the main()
, the function call immediately got finished and it cannot wait for the inner function to finish. So it is inevitable to call the loop in the main()
. But then I think it is incompatible with the inner function, which also must call it.
How can I deal with the problem and run the loop? The orinigal code is all in the same main()
and it worked, but I want to make it cleaner if possible.