2

I load into the proxies variable my proxies and try to do async requests for get ip. Its simple:

async def get_ip(proxy):
    timeout = aiohttp.ClientTimeout(connect=5)
    async with aiohttp.ClientSession(timeout=timeout) as session:
        try:
            async with session.get('https://api.ipify.org?format=json', proxy=proxy, timeout=timeout) as response:
                json_response = await response.json()
                print(json_response)
        except:
            pass


if __name__ == "__main__":
    proxies = []

    start_time = time.time()
    loop = asyncio.get_event_loop()
    tasks = [asyncio.ensure_future(get_ip(proxy)) for proxy in proxies]
    loop.run_until_complete(asyncio.wait(tasks))
    print('time spent to work: {} sec --------------'.format(time.time()-start_time))

This code work fine when i try to do 100-200-300-400 requests, but when is count more than 500 i alltime getting error:

Traceback (most recent call last):
  File "async_get_ip.py", line 60, in <module>
    loop.run_until_complete(asyncio.wait(tasks))
  File "C:\Python37\lib\asyncio\base_events.py", line 571, in run_until_complete
    self.run_forever()
  File "C:\Python37\lib\asyncio\base_events.py", line 539, in run_forever
    self._run_once()
  File "C:\Python37\lib\asyncio\base_events.py", line 1739, in _run_once
    event_list = self._selector.select(timeout)
  File "C:\Python37\lib\selectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "C:\Python37\lib\selectors.py", line 314, in _select
    r, w, x = select.select(r, w, w, timeout)
ValueError: too many file descriptors in select()

I was looking for a solution, but all I found was a limitation at the OS. Can I somehow get around this problem without using additional libraries?

kshnkvn
  • 876
  • 2
  • 18
  • 31
  • 1
    Since it appears you're using Windows, try switching to the [proactor event loop](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.ProactorEventLoop) which doesn't use select(). – user4815162342 Jul 24 '19 at 13:04

1 Answers1

2

It's not a good idea to start unlimited amount of requests simultaneously. Each started request will consume some resources from CPU/RAM to OS's select() capacity, what, as is in your case, sooner or later will lead to problems.

To avoid the situation you should use asyncio.Semaphore which allows to limit maximum amount of simultaneous connections.

I believe only few changes should be made in your code:

sem = asyncio.Semaphore(50)

async def get_ip(proxy):
    async with sem:
        # ...

Here's full complex example of how to use semaphore in general.


P.S.

except:
    pass

You should never do such thing, it'll just break a code sooner or later.

At very-very least use except Exception.

Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159
  • wow thank you. Can you please explain me what difference between ```except``` and ```except Exception```? Or you mean something like ```except ValueError```? – kshnkvn Jul 25 '19 at 08:36
  • 1
    @kshnkvn `except:` will catch [every kind of exception](https://docs.python.org/3/library/exceptions.html#exception-hierarchy) including "service exceptions" like `GeneratorExit` or `SystemExit` non of which is subclass of `Exception` thus it can break work of generators or shutdown process. `except Exception` will catch only "error exceptions", which in 99% of cases is what you really want to achieve. – Mikhail Gerasimov Jul 25 '19 at 09:20