22

I'm trying to make a bunch of requests (~1000) using Asyncio and the aiohttp library, but I am running into a problem that I can't find much info on.

When I run this code with 10 urls, it runs just fine. When I run it with 100+ urls, it breaks and gives me RuntimeError: Event loop is closed error.

import asyncio
import aiohttp


@asyncio.coroutine
def get_status(url):
    code = '000'
    try:
        res = yield from asyncio.wait_for(aiohttp.request('GET', url), 4)
        code = res.status
        res.close()
    except Exception as e:
        print(e)
    print(code)


if __name__ == "__main__":
    urls = ['https://google.com/'] * 100
    coros = [asyncio.Task(get_status(url)) for url in urls]
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait(coros))
    loop.close()

The stack trace can be found here.

Any help or insight would be greatly appreciated as I've been banging my head over this for a few hours now. Obviously this would suggest that an event loop has been closed that should still be open, but I don't see how that is possible.

Patrick Allen
  • 2,148
  • 2
  • 16
  • 20
  • is not `Asyncio` error. Python recursive error, reached limit. need thread for all non class function... – dsgdfg Sep 16 '15 at 05:17
  • First, make sure you are using the latest aiohttp release. I assume you do. Technically aiohttp need one loop iteration after finishing request for closing underlying sockets. So insert `loop.run_until_complete(asyncio.sleep(0))` before `loop.close()` call. – Andrew Svetlov Sep 16 '15 at 06:00
  • Your traceback suggests that a job submitted to an [Executor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) through [run_in_executor](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.BaseEventLoop.run_in_executor) returned after the loop has been closed. Weirdly enough, [aiohttp](https://github.com/KeepSafe/aiohttp/search?utf8=%E2%9C%93&q=run_in_executor&type=Code) and [asyncio](https://github.com/python/asyncio/search?utf8=%E2%9C%93&q=run_in_executor) don't use `run_in_executor`... – Vincent Sep 16 '15 at 13:11
  • @AndrewSvetlov, thanks for the reply - I tried sleeping before close, but still no dice... any other ideas? – Patrick Allen Sep 16 '15 at 13:38
  • @Vincent technically they does, DNS resolving is performed by `run_in_executor` -- but it should be done before finishing `get_status` tasks. – Andrew Svetlov Sep 16 '15 at 13:44
  • For anyone using python's async socket.io, make sure to run `await sio.wait()` in your main function – LeoDog896 Dec 17 '22 at 14:43

3 Answers3

18

The bug is filed as https://github.com/python/asyncio/issues/258 Stay tuned.

As quick workaround I suggest using custom executor, e.g.

loop = asyncio.get_event_loop()
executor = concurrent.futures.ThreadPoolExecutor(5)
loop.set_default_executor(executor)

Before finishing your program please do

executor.shutdown(wait=True)
loop.close()
Andrew Svetlov
  • 16,730
  • 8
  • 66
  • 69
7

You're right, loop.getaddrinfo uses a ThreadPoolExecutor to run socket.getaddrinfo in a thread.

You're using asyncio.wait_for with a timeout, which means res = yield from asyncio.wait_for... will raise a asyncio.TimeoutError after 4 seconds. Then the get_status coroutines return None and the loop stops. If a job finishes after that, it will try to schedule a callback in the event loop and raises an exception since it is already closed.

Vincent
  • 12,919
  • 1
  • 42
  • 64
  • Ahh, that makes sense, but this is the only way I have found to implement request timeouts. Do you know of a way that I could timeout without closing the loop? – Patrick Allen Sep 16 '15 at 15:03
  • @PatrickAllen You might want to increase the [number of workers](https://github.com/python/asyncio/blob/27f3499f968e8734fef91677eb339b5d32a6f675/asyncio/base_events.py#L44) that is 5 by default. – Vincent Sep 16 '15 at 15:58
  • 2
    @PatrickAllen Or use `loop._default_executor.shutdown(wait=True)` before closing the loop. – Vincent Sep 16 '15 at 16:00
  • I'll mark this as answered, because this seems to have fixed the original problem. Should I be limiting the max number of connections? It seems that requests are timing out for no apparent reason. Maybe I'm making too many requests too quickly? – Patrick Allen Sep 16 '15 at 16:11
  • @PatrickAllen Well, 5 worker threads and a thousand of request means you're trying to run 200 `socket.getaddrinfo` in 4 seconds which seems reasonable to me, even though the number of workers can be increased. You can also give a custom `TcpConnector` to `request` in order to specify a connection timeout: `connector=aiohttp.TCPConnector(loop=loop, force_close=True, conn_timeout=1)` – Vincent Sep 17 '15 at 08:39
0

This is a bug in the interpreter. Fortunately, it was finally fixed in 3.10.6 so you just need to update your installed Python.

Arhadthedev
  • 219
  • 1
  • 7
  • 19