3

I have a script that is making requests using multiple headless browsers (selenium + Chrome drivers), each of which makes HTTP requests over a different SOCKS proxy. All of the requests are made inside of a concurrent.futures.ThreadPoolExecutor(). For some reason, I am periodically getting the error ResponseNotReady: Idle, and I can't understand why.

My question is:

1) What is causing this ResponseNotReady error? Is it something I'm doing wrong, or a normal exception that I just need to catch and respond to?

2) How can I properly handle the ResponseNotReady exception? What is the best way to recover from this?

Here is the function where I am making the requests:

def _fetch_selenium(self, url, session, port):
    domain = self.domainFromURL(url)
    with self.locks[port][domain]: 
        try:
            start_time = datetime.now()
            session.get(url)
            sleep(self.delay) 
            return {'url': url, 
                    'html': session.page_source, 
                    'time': datetime.now() - start_time, 
                    'proxy_port': port}

        except selenium_exceptions.WebDriverException as e:
            print("Request of URL " + url + " failed with exception: " + str(e))
            sleep(self.delay)          
            return {'url': url, 
                    'html': None, 
                    'time': datetime.now() - start_time, 
                    'proxy_port': port}

And here is the code where I am dispatching requests to different selenium sessions (the fetch() function basically just ends up calling _fetch_selenium():

def fetchConcurrent(self, urls):

    results = []

    timeouts = defaultdict(int)
    with ThreadPoolExecutor(max_workers=self.num_threads) as executor:
        futures = []
        for url in urls:
            session = self.sessions.popleft()
            futures.append(executor.submit(self.fetch, url, session))
            self.sessions.append(session)

        for future in as_completed(futures):

            result, session = future.result()
            results.append(result)
            if not result['html']:
                socks_port = result['proxy_port']
                print(f"Got no HTML for url {result['url']}, using port {socks_port}.")
                timeouts[socks_port] += 1
                if timeouts[socks_port] > MAX_TIMEOUTS_PER_CLIENT:
                    tor_client_pool.replaceClient(socks_port)
                    self.killSeleniumSession(session)
                    self.sessions.remove(session)
                    self.newSeleniumSession(socks_port)
                    timeouts[socks_port] = 0
                continue

            print(f"GOT: {result['url'].strip()} in {result['time']} seconds, using proxy on port {result['proxy_port']})")

    return results

When I run the above code, it successfully downloads many pages, but then will eventually hit a page where this ResponseNotReady error pops up, but I can't tell what about the page is causing it to crash. Here is the traceback I see when the error occurs:

~/Code/gis project/code/TorGetter.py in _fetch_selenium(self, url, session, port)
    202             try:
    203                 start_time = datetime.now()
--> 204                 session.get(url)
    205                 sleep(self.delay)           # no other requests to this domain can be made by this tor client while we sleep() here
    206                 return {'url': url, 

~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in get(self, url)
    324         Loads a web page in the current browser session.
    325         """
--> 326         self.execute(Command.GET, {'url': url})
    327 
    328     @property

~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
    310 
    311         params = self._wrap_value(params)
--> 312         response = self.command_executor.execute(driver_command, params)
    313         if response:
    314             self.error_handler.check_response(response)

~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py in execute(self, command, params)
    470         data = utils.dump_json(params)
    471         url = '%s%s' % (self._url, path)
--> 472         return self._request(command_info[0], url, body=data)
    473 
    474     def _request(self, method, url, body=None):

~/.local/share/virtualenvs/code-pIyQci_2/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py in _request(self, method, url, body)
    494             try:
    495                 self._conn.request(method, parsed_url.path, body, headers)
--> 496                 resp = self._conn.getresponse()
    497             except (httplib.HTTPException, socket.error):
    498                 self._conn.close()

/usr/lib/python3.6/http/client.py in getresponse(self)
   1319         #
   1320         if self.__state != _CS_REQ_SENT or self.__response:
-> 1321             raise ResponseNotReady(self.__state)
   1322 
   1323         if self.debuglevel > 0:

ResponseNotReady: Idle

Any ideas what is going on here, and how to fix it? Thanks!

J. Taylor
  • 4,567
  • 3
  • 35
  • 55
  • not sure but it might help - https://stackoverflow.com/questions/8385281/responsenotready-for-really-simple-python-http-request – Prany May 18 '18 at 04:22
  • @Prany Thanks. I saw that one, as well as this one --> https://stackoverflow.com/questions/3231543/python-httplib-responsenotready ... and I feel like they are probably related (albeit for `httplib`, not `selenium`). But I am not sure how to fix my code, based on what I read there. I feel like it might have something to do with how I'm handling exceptions in `_fetch_selenium()`, that is leaving the session in the wrong state for next request. But I can't find what I need to clear it up. – J. Taylor May 18 '18 at 04:24

0 Answers0