21

I am trying to teach myself Python's async functionality. To do so I have built an async web scraper. I would like to limit the total number of connections I have open at once to be a good citizen on servers. I know that semaphore's are a good solution, and the asyncio library has a semaphore class built in. My issue is that Python complains when using yield from in an async function as you are combining yield and await syntax. Below is the exact syntax I am using...

import asyncio
import aiohttp

sema = asyncio.BoundedSemaphore(5)

async def get_page_text(url):
    with (yield from sema):
        try:
            resp = await aiohttp.request('GET', url)
            if resp.status == 200:
                ret_val = await resp.text()
        except:
            raise ValueError
        finally:
            await resp.release()
    return ret_val

Raising this Exception:

File "<ipython-input-3-9b9bdb963407>", line 14
    with (yield from sema):
         ^
SyntaxError: 'yield from' inside async function

Some possible solution I can think of...

  1. Just use the @asyncio.coroutine decorator
  2. Use threading.Semaphore? This seems like it may cause other issues
  3. Try this in the beta of Python 3.6 for this reason.

I am very new to Python's async functionality so I could be missing something obvious.

Sebastian Wozny
  • 16,943
  • 7
  • 52
  • 69
Bruce Pucci
  • 1,821
  • 2
  • 19
  • 26

3 Answers3

31

You can use the async with statement to get an asynchronous context manager:

#!/usr/local/bin/python3.5
import asyncio
from aiohttp import ClientSession


sema = asyncio.BoundedSemaphore(5)

async def hello(url):
    async with ClientSession() as session:
        async with sema, session.get(url) as response:
            response = await response.read()
            print(response)

loop = asyncio.get_event_loop()
loop.run_until_complete(hello("http://httpbin.org/headers"))

Example taken from here. The page is also a good primer for asyncio and aiohttp in general.

Sebastian Wozny
  • 16,943
  • 7
  • 52
  • 69
  • 1
    I find the example very interesting, however something surprises me. It is this line: async with sema, session.get(url) as response: How come you are using sema, session... on the same context manager? What does that line do? Can you explain a little? – Liviu Feb 08 '18 at 16:46
  • Nothing special going on there, I think this answer will help: https://stackoverflow.com/questions/3024925/python-create-a-with-block-on-several-context-managers – Sebastian Wozny Feb 09 '18 at 14:57
  • How's using semaphore different from limiting the connection pool size - https://docs.aiohttp.org/en/stable/client_advanced.html#limiting-connection-pool-size – Shiva Nov 05 '20 at 11:16
  • Why `BoundedSemaphore` takes argument 5? can we sent different inputs? – alper Aug 16 '21 at 02:51
7

OK, so this is really silly but I just replaces yield from with await in the semaphore context manager and it is working perfectly.

sema = asyncio.BoundedSemaphore(5)

async def get_page_text(url):
    with (await sema):
        try:
            resp = await aiohttp.request('GET', url)
            if resp.status == 200:
                ret_val = await resp.text()
        except:
            raise ValueError
        finally:
            await resp.release()
    return ret_val
Bruce Pucci
  • 1,821
  • 2
  • 19
  • 26
  • 9
    Even better, you can use the `async with` statement: `async with sema: [...]` – Vincent Nov 28 '16 at 08:22
  • Async is also interesting for me, I found your topic interesting and vote to leave it here. BTW, could you share you whole code for ex on github? – Eugene Lisitsky Nov 28 '16 at 12:03
  • Yes, I will do that and post a link in the comments a little later. – Bruce Pucci Nov 28 '16 at 13:06
  • @EugeneLisitsky Here is the code... https://github.com/brucepucci/asyncio_scraper. I am trying to get all of the links that start with "gid" in this file structure... http://gd2.mlb.com/components/game/mlb/year_2016/ – Bruce Pucci Nov 29 '16 at 02:42
0

For the semaphore only:

sem = asyncio.Semaphore(10)

# ... later
async with sem:
    # work with shared resource

which is equivalent to:

sem = asyncio.Semaphore(10)

# ... later
await sem.acquire()
try:
    # work with shared resource
finally:
    sem.release()

ref: https://docs.python.org/3/library/asyncio-sync.html#asyncio.Semaphore

renan
  • 75
  • 7