3

I'm having strange issues running a threaded script on Python 2.7.13. Sometimes the whole python.exe just crashes with no error message, sometimes the script simply hangs up and stops running, but sometimes I actually get the error message.

    Exception in thread Thread-370:
    Traceback (most recent call last):
      File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
        self.run()
      File "C:\Python27\lib\threading.py", line 754, in run
        self.__target(*self.__args, **self.__kwargs)
      File ".\1024.py", line 38, in loadData
        result = play_scraper.similar(app_id, results=60)
      File "C:\Python27\lib\site-packages\play_scraper\api.py", line 92, in similar
        return s.similar(app_id, **kwargs)
      File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 435, in similar
        response = send_request('GET', url)
      File "C:\Python27\lib\site-packages\play_scraper\utils.py", line 128, in send_request
        verify=verify)
      File "C:\Python27\lib\site-packages\requests\sessions.py", line 501, in get
        return self.request('GET', url, **kwargs)
      File "C:\Python27\lib\site-packages\requests\sessions.py", line 488, in request
        resp = self.send(prep, **send_kwargs)
      File "C:\Python27\lib\site-packages\requests\sessions.py", line 609, in send
        r = adapter.send(request, **kwargs)
      File "C:\Python27\lib\site-packages\requests\adapters.py", line 423, in send
        timeout=timeout
      File "C:\Python27\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 594, in urlopen
        chunked=chunked)
      File "C:\Python27\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 350, in _make_request
        self._validate_conn(conn)
      File "C:\Python27\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 835, in _validate_conn
        conn.connect()
      File "C:\Python27\lib\site-packages\requests\packages\urllib3\connection.py", line 281, in connect
        conn = self._new_conn()
      File "C:\Python27\lib\site-packages\requests\packages\urllib3\connection.py", line 138, in _new_conn
        (self.host, self.port), self.timeout, **extra_kw)
      File "C:\Python27\lib\site-packages\requests\packages\urllib3\util\connection.py", line 79, in create_connection
        sock = socket.socket(af, socktype, proto)
      File "C:\Python27\lib\site-packages\gevent\_socket2.py", line 124, in __init__
        self._read_event = io(fileno, 1)
      File "gevent.libev.corecext.pyx", line 487, in gevent.libev.corecext.loop.io (src/gevent/libev/gevent.corecext.c:6680)
      File "gevent.libev.corecext.pyx", line 835, in gevent.libev.corecext.io.__init__ (src/gevent/libev/gevent.corecext.c:11088)
    IOError: cannot watch more than 1024 sockets

My script looks like this

import requests
from threading import Thread
import play_scraper

with open('apps.txt') as f:
    app_idList = f.read().splitlines()

checkedIds = 0

def safe_print(content):
    print "{0}\n".format(content),

def loadData (threadName,app_id):

    global checkedIds

    safe_print(threadName + str(checkedIds) + " Checking similar apps to " + app_id)
    result = play_scraper.similar(app_id, results=60)

    checkedIds += 1

for app_id in app_idList:

    t = Thread(target=loadData, args=("Thread #0: ",app_id))
    t.start()
    t.join()

After about 365-375 loops I get the error message above. I'm using play_scraper module for my project and the offending code looks like this

def send_request(method, url, data=None, params=None, headers=None, verify=True):

    data = {} if data is None else data
    params = {} if params is None else params
    headers = default_headers() if headers is None else headers
    if not data and method == 'POST':
        data = generate_post_data()

    try:
        response = requests.request(
            method=method,
            url=url,
            data=data,
            params=params,
            headers=headers,
            verify=verify)
        if not response.status_code == requests.codes.ok:
            response.raise_for_status()
    except requests.exceptions.RequestException as e:
        log.error(e)
        raise

    return response

I read somewhere that the issue could be due to open connection sockets and using session would fix the problem. I edited that function to the following, but I still have the same issue.

def send_request(method, url, data=None, params=None, headers=None, verify=True):

    data = {} if data is None else data
    params = {} if params is None else params
    headers = default_headers() if headers is None else headers
    if not data and method == 'POST':
        data = generate_post_data()

    try:
        s = requests.Session()
        if method == 'POST':
            response = s.post(
                url=url,
                data=data,
                params=params,
                headers=headers,
                verify=verify)
        else:
             response = s.get(
                url=url,
                data=data,
                params=params,
                headers=headers,
                verify=verify)
        if not response.status_code == requests.codes.ok:
            response.raise_for_status()
    except requests.exceptions.RequestException as e:
        log.error(e)
        raise
    finally:
        s.close()

    return response

If I run a simple loop script, everything runs fine.

    import play_scraper


    with open('apps.txt') as f:
        app_idList = f.read().splitlines()

    checkedIds = 0

    for app_id in app_idList:
        print str(checkedIds ) + " Checking similar apps to " + app_id
        result = play_scraper.similar(app_id, results=60)
        checkedIds += 1

The play scraper module can be found here https://github.com/danieliu/play-scraper

How would I go about fixing this issue?

John Baker
  • 425
  • 4
  • 22
  • 1
    You probably just need to increase the open file descriptor limit on your system. For more, see here: [increase ulimit for # of file descriptors](https://stackoverflow.com/questions/11017402/increase-ulimit-for-of-file-descriptors) - the default is usually 1024 on Linux. – Fady Saad Aug 21 '17 at 23:40
  • @FadySaad I'm running it on Windows 10 64bit – John Baker Aug 21 '17 at 23:42
  • I just telling you that the default in Linux is 1024, check it for Windows – Fady Saad Aug 21 '17 at 23:43
  • 1
    1024 are a lot of *fd*s but it doesn't get even close to what the _OS_ can support(_Ux_: - ~200k in my case).. Check [\[SO\]: Windows limitation on number of simultaneously opened sockets/connections per machine](https://stackoverflow.com/questions/9487569/windows-limitation-on-number-of-simultaneously-opened-sockets-connections-per-machine). But I can't not notice _threads_ and when it comes to _Python_, _GIL_ is the dreaded acronym that pops up. – CristiFati Aug 21 '17 at 23:54
  • You need to not do any processing at the top level of the script as Windows doesn't fork. Put everything in a `main` function. See [the documentation](https://docs.python.org/2/library/multiprocessing.html#windows) – Peter Wood Aug 22 '17 at 00:08
  • @PeterWood Do you mean running it [like this](https://pastebin.com/B866eTfH) ? If so, I tried it and I get the same problem. – John Baker Aug 22 '17 at 01:36

0 Answers0