So I've done some research on this and I see there is a fair amount of documentation, but I am quite new to asynchronous programming and I am not grasping exactly what needs to be done.
I need to download a bunch of files asynchronously, I have:
- Definied an asynchronous download function
- Defined another asynchronous function that puts all the coroutines into a list
- Created an event loop and run the coroutine creation function.
Example of download function:
async def download(link, name):
async with aiohttp.ClientSession() as session:
async with session.get(link, proxy = proxy) as resp:
content = await resp.read()
file_name = f"files/{name}.pdf"
with open(file_name, 'wb') as f:
f.write(content)
f.close()
Example of task list creation function:
async def do_downloads(links, names):
tasks = []
for link, name in zip(links, names):
tasks.append(download(link))
await asyncio.wait(tasks)
I now run these functions with:
links = "some list of links"
names = "some list of the corresponding file names"
loop = asyncio.get_event_loop()
loop.run_until_complete(downloads(links, names))
loop.close
If I run this program on the all the links, its too many requests and the programming crashes after about completing about half of the downloads.
I crudely attempted to make batches from the total:
loop = asyncio.get_event_loop()
for _ in BATCHES:
links = links[current_start:current_start + BATCH_LENGTH]
names = names[current_start:current_start + BATCH_LENGTH]
loop.run_until_complete(do_downloads(links, names))
current_start += BATCH_LENGTH
loop.close()
But this isnt working, I am getting errors like "Set of coroutines is empty" etc. I also can't handle exceptions in the case of when the server rejects my requests.
I am aware there are much better ways of doing this, I just need someone to explain them to me as i'm getting lost in the documentation!
Thanks
EDIT:
The first iteration always runs fine, but fails on the next one.
EDIT2:
This is the error I get from running the example, although I am looking for a better way to do this, not necessarily a solution to this problem...
Traceback (most recent call last):
File "batch_get.py", line 60, in <module>
main(loop)
File "batch_get.py", line 53, in main
loop.run_until_complete(do_downloads(links, names, indices))
File "/home/liam/anaconda3/lib/python3.7/asyncio/base_events.py", line 583, in run_until_complete
return future.result()
File "batch_get.py", line 24, in do_downloads
await asyncio.wait(tasks)
File "/home/liam/anaconda3/lib/python3.7/asyncio/tasks.py", line 380, in wait
raise ValueError('Set of coroutines/Futures is empty.')
ValueError: Set of coroutines/Futures is empty.
EDIT3:
So the solution to the example problem was that my list was empty on the second iteration... silly me was just something with the way I was indexing the list and not re instating the list at the end of each loop.
But my real question is still for a better way to do this i.e With queues or semaphores