1

I am doing some requests to Azure Maps. I have a subscription key (subscriptionKey) and a list of addresses I want to look for (addresses):

query_template = 'https://atlas.microsoft.com/search/address/json?&subscription-key={}&api-version=1.0&language=en-US&query={}'
queries = [query_template.format(subscriptionKey, address) for address in addresses]

I come from this question (not necessary to read it to understand the following) and everything worked fine in my sample of 1k queries. However, when I tried 10k queries I got ValueError: too many file descriptors in select(). I added some of the answers from here and now my code looks like this:

import asyncio
from aiohttp import ClientSession
from ssl import SSLContext
from sys import platform
import nest_asyncio
nest_asyncio.apply()

# Function to get a JSON from the result of a query
async def fetch(url, session):
    async with session.get(url, ssl=SSLContext()) as response:
        return await response.json()

# Function to run 'fetch()' with a Semaphore and check that the result is a dictionary (JSON)
async def fetch_sem(sem, attempts, url, session):
    semaphore = asyncio.Semaphore(sem)
    async with semaphore:
        for _ in range(attempts):
            result = await fetch(url, session)
            if isinstance(result, dict):
                break
        return result

# Function to search for all queries
async def fetch_all(sem, attempts, urls):
    async with ClientSession() as session:
        return await asyncio.gather(*[fetch_sem(sem, attempts, url, session) for url in urls], return_exceptions=True)

# Making the queries
if __name__ == '__main__':
    if platform == 'win32':
        loop = asyncio.ProactorEventLoop()
        asyncio.set_event_loop(loop)
    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(fetch_all(1000, 3, queries))

Note that I have included both asyncio.Semaphore and asyncio.ProactorEventLoop(). But despite of this additions, I still get ValueError: too many file descriptors in select().

Could I get some help with this issue? Thank you!

gfsemelas
  • 45
  • 8

1 Answers1

2

The purpose of the semaphore is to count how many fetch operations are currently running and enforce an upper limit. That's why you need to have one semaphore:

You could create it in fetch_all and pass to fetch_sem:

async def fetch_sem(semaphore, attempts, url, session):
    async with semaphore:
        ... 
        return result

async def fetch_all(limit, attempts, urls):
    semaphore = asyncio.Semaphore(limit)
    async with ClientSession() as session:
        return await asyncio.gather(*[fetch_sem(semaphore, attempts, url, session) for url in urls], return_exceptions=True)

....
results = loop.run_until_complete(fetch_all(1000, 3, queries))
VPfB
  • 14,927
  • 6
  • 41
  • 75
  • Oh, I get it. I didn't realize that I was creating a new Semaphore each time ```fetch_all()``` was calling ```fetch_sem()```. I corrected it. However, I keep getting ```ValueError: too many file descriptors in select()```... – gfsemelas Jul 07 '21 at 22:20
  • Did you try to lower the semaphore value (limit)? I don't work with Windows and don't know how mane descriptors a proactor loop can handle. And please check the stack trace to verify the error is really occuring in the asyncio proactor loop code. – VPfB Jul 08 '21 at 05:46
  • @GonzaloFS The error message sounds like the proactor event loop is not being set up correctly, because it should never run `select()`. Maybe the `nest_asyncio` setup is causing issues, try removing it and see if you still get the same error. – user4815162342 Jul 08 '21 at 06:51
  • Yes, I tried to lower the limit, even to really low numbers, but still no good. Regarding the ```nest_asyncio``` library, if I remove its settings I inmediatly get ```RuntimeError: This event loop is already running```. I got this solution from [here](https://medium.com/@vyshali.enukonda/how-to-get-around-runtimeerror-this-event-loop-is-already-running-3f26f67e762e), and it seemed to work. – gfsemelas Jul 08 '21 at 11:41
  • Yes, I tried to lower the limit, even to really low numbers, but still no good. I saw [here](https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html#:~:text=That%E2%80%99s%20bad%2C%20seems,tasks%20of%201000.) that the limit should be 1024. However, I tried with even 10 (it takes a while, but just for trying) and I get the same error. – gfsemelas Jul 08 '21 at 11:51
  • The ```asyncio.ProactorEventLoop()``` is just a thing I tried, but it makes no difference regarding my error. If I analyze the trace, the errors are raising in: **lib\site-packages\nest_asyncio.py in run_until_complete(self, future)**, **lib\site-packages\nest_asyncio.py in _run_once(self)**, **lib\selectors.py in select(self, timeout)**, **lib\selectors.py in _select(self, r, w, _, timeout)** – gfsemelas Jul 08 '21 at 12:00
  • @GonzaloFS Thanks for the feedback. Is `nest_asyncio` really required? Could you try without it? Or move `nest_asyncio.apply()` after `asyncio.set_event_loop`. – VPfB Jul 08 '21 at 12:14
  • @VPfB If I remove ```nest_asyncio.apply()``` I inmediatly get ```RuntimeError: This event loop is already running```. If I move ```nest_asyncio.apply()``` after ```asyncio.set_event_loop()``` I still get ```ValueError: too many file descriptors in select()``` – gfsemelas Jul 08 '21 at 12:36
  • @GonzaloFS Unfortunately I do not understand what is going wrong. I cannot reproduce the issue either, because I'm on Linux. I'm sorry, I cannot help more. – VPfB Jul 08 '21 at 12:47