1

So I've done some research on this and I see there is a fair amount of documentation, but I am quite new to asynchronous programming and I am not grasping exactly what needs to be done.

I need to download a bunch of files asynchronously, I have:

  1. Definied an asynchronous download function
  2. Defined another asynchronous function that puts all the coroutines into a list
  3. Created an event loop and run the coroutine creation function.

Example of download function:

async def download(link, name):
    async with aiohttp.ClientSession() as session:
        async with session.get(link, proxy = proxy) as resp:
            content = await resp.read()
        file_name = f"files/{name}.pdf"
    with open(file_name, 'wb') as f:
        f.write(content)
    f.close()

Example of task list creation function:

async def do_downloads(links, names):
    tasks = []
    for link, name in zip(links, names):
        tasks.append(download(link))
    await asyncio.wait(tasks)

I now run these functions with:

links = "some list of links"
names = "some list of the corresponding file names"
loop = asyncio.get_event_loop()
loop.run_until_complete(downloads(links, names))
loop.close

If I run this program on the all the links, its too many requests and the programming crashes after about completing about half of the downloads.

I crudely attempted to make batches from the total:

    loop = asyncio.get_event_loop()
    for _ in BATCHES:
          links = links[current_start:current_start + BATCH_LENGTH]
          names = names[current_start:current_start + BATCH_LENGTH]
          loop.run_until_complete(do_downloads(links, names))
          current_start += BATCH_LENGTH
    loop.close()

But this isnt working, I am getting errors like "Set of coroutines is empty" etc. I also can't handle exceptions in the case of when the server rejects my requests.

I am aware there are much better ways of doing this, I just need someone to explain them to me as i'm getting lost in the documentation!

Thanks

EDIT:

The first iteration always runs fine, but fails on the next one.

EDIT2:

This is the error I get from running the example, although I am looking for a better way to do this, not necessarily a solution to this problem...

Traceback (most recent call last):
  File "batch_get.py", line 60, in <module>
    main(loop)
  File "batch_get.py", line 53, in main
    loop.run_until_complete(do_downloads(links, names, indices))
  File "/home/liam/anaconda3/lib/python3.7/asyncio/base_events.py", line 583, in run_until_complete
    return future.result()
  File "batch_get.py", line 24, in do_downloads
    await asyncio.wait(tasks)
  File "/home/liam/anaconda3/lib/python3.7/asyncio/tasks.py", line 380, in wait
    raise ValueError('Set of coroutines/Futures is empty.')
ValueError: Set of coroutines/Futures is empty.

EDIT3:

So the solution to the example problem was that my list was empty on the second iteration... silly me was just something with the way I was indexing the list and not re instating the list at the end of each loop.

But my real question is still for a better way to do this i.e With queues or semaphores

liamod
  • 316
  • 1
  • 9
  • when you use `with open(...) as f:` then you don't need `f.close()` – furas Dec 09 '20 at 10:26
  • always put full error message(starting at word "Taceback") in question (not comment) as text (not screenshot). There are other ueful information. – furas Dec 09 '20 at 10:27
  • You could use a semaphore to limit the member of concurrent downloads https://stackoverflow.com/a/31795242/4279 – jfs Dec 09 '20 at 16:27

0 Answers0