I have a list of URL's of websites that I want to download repeatedly (in variable time intervals) using Python. It is necessary to do that asynchronously to cope with a large number of websites and/or long response times.
I've tried many things with event loops, queues, async functions, asyncio, etc., but I do not get it working. The following very simple version downloads the websites repeatedly, but it does not download the websites concurrently - instead the next download only starts after the previous one is finished.
import asyncio
import datetime
import aiohttp
def produce_helper(url: str):
# helper, because I cannot call an async function with loop.call_later
loop.create_task(produce(url))
async def produce(url: str):
await q.put(url)
print(f'{datetime.datetime.now().strftime("%H:%M:%S.%f")} - Produced {url}')
async def consume():
async with aiohttp.ClientSession() as session:
while True:
url = await q.get()
print(f'{datetime.datetime.now().strftime("%H:%M:%S.%f")} - Start: {url}')
async with session.get(url, timeout=10) as response:
print(f'{datetime.datetime.now().strftime("%H:%M:%S.%f")} - Finished: {url}')
q.task_done()
loop.call_later(10, produce_helper, url)
q = asyncio.Queue()
url_list = ["https://www.google.com/", "https://www.bing.com/", "https://www.yelp.com/"]
loop = asyncio.get_event_loop()
for url in url_list:
loop.create_task(produce(url))
loop.create_task(consume())
loop.run_forever()
Is this a suitable approach for my problem? Is there anything better conceptually?
And how do I accomplish concurrent downloads?
Any help is appreciated.
EDIT:
The challenge (as described in the comment below) is the following: After each successful download, I want to add the respective URL back to the queue - to be due after a specified waiting time (10 s in the example in my question). As soon, as it is due, I want to download the website again, add the URL back to the queue etc.