6

I am trying to use selenium/phantomjs with scrapy and I'm riddled with errors. For example, take the following code snippet:

def parse(self, resposne):

    while True:
        try:
            driver = webdriver.PhantomJS()
            # do some stuff
            driver.quit()
            break
        except (WebDriverException, TimeoutException):
            try:
                driver.quit()
            except UnboundLocalError:
                print "Driver failed to instantiate"
            time.sleep(3)
            continue

A lot of the times the driver it seems it has failed to instantiate (so the driver is unbound, hence the exception), and I get the blurb (along with the print message I put in)

Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.phantomjs.service.Service object at 0x7fbb28dc17d0>> ignored

Googling around, it seems everyone suggests updating phantomjs, which I have (1.9.8 built from source). Would anyone know what else could be causing this problem and a suitable diagnosis?

pad
  • 2,296
  • 2
  • 16
  • 23

3 Answers3

6

The reason for this behavior is how the PhantomJS driver's Service class is implemented.

There is a __del__ method defined that calls self.stop() method:

def __del__(self):
    # subprocess.Popen doesn't send signal on __del__;
    # we have to try to stop the launched process.
    self.stop()

And, self.stop() is assuming the service instance is still alive trying to access it's attributes:

def stop(self):
    """
    Cleans up the process
    """
    if self._log:
        self._log.close()
        self._log = None
    #If its dead dont worry
    if self.process is None:
        return

    ...

The same exact problem is perfectly described in this thread:


What you should do is to silently ignore AttributeError occurring while quitting the driver instance:

try:
    driver.quit()
except AttributeError:
    pass

The problem was introduced by this revision. Which means that downgrading to 2.40.0 would also help.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • thanks, I'll do this. Any ideas for the diagnosis? I changed the number of concurrent requests to 1 but it didn't help. My guess was too many processes are being spawned which might cause the error. – pad Dec 28 '14 at 06:37
  • @pad did catching `AttributeError` help? – alecxe Dec 28 '14 at 06:38
  • sorry I'll be able to report on that in an hour (another spider is running). Although this way we're just sweeping this error under the rug, right :). – pad Dec 28 '14 at 06:40
  • will also try the downgrade solution and report back. – pad Dec 28 '14 at 06:41
  • I tried both solutions. Catching the `AttributeError` is suppressing its display nicely as you wrote. Unfortunately, downgrading to 2.40.0 did not help with the error itself. – pad Dec 28 '14 at 16:24
  • @pad thanks for the update, could you also try `2.39.0`? – alecxe Dec 28 '14 at 16:27
  • I tried 2.39.0, still no success. This is very strange. – pad Dec 28 '14 at 16:56
  • so from what I understand from your answer, If subprocess.Popen.__init__ the new instance is destroyed and its __del__ method is called but __init__ never got the chance to execute, so the field is not defined.. which raises an AttributeError? Would that also mean that any child processes created would be out in the wild and that could cause continuous failures? – pad Dec 29 '14 at 19:54
  • @pad it is basically about how does garbage collection work in Python. The general golden rule: do not assume `self` would be defined in `__del__()` method. If you want more info about the problem, just google `__del__ python self`. Hope that helps. – alecxe Dec 29 '14 at 20:20
  • @pad ah, yeah, also, in other words `__del__` is not an opposite of `__init__`. – alecxe Dec 29 '14 at 20:20
  • I think the problem that v2.44.0 available on pypi is different from the most updated code, thanks for helping with this. – pad Jan 01 '15 at 05:53
2

I had that problem because phantomjs was not available from script (was not in path). You can check it by running phantomjs in console.

Oleksandr Slynko
  • 787
  • 6
  • 11
  • My problem was memory. I've made a small patch to service.py which gets rid of this stray error. Maybe I make a pull request when I've investigated it more. Contd.. – pad Dec 31 '14 at 13:18
  • Executable is not a problem because `phantomjs` is the default used by `__init__` and its the same on my system. We're both seeing the same error because when `self.stop()` is called but there is no `self.process` to terminate. So in both cases, something went wrong -> process isn't defined, throw up attribute error. – pad Dec 31 '14 at 13:21
0

Selenium version 2.44.0 on pypi needs the following patch in Service.__init__ of selenium.webdriver.common.phantomjs.service

self.process = None

I was thinking of submitting a patch but this already exists in the most recent version on google code.

pad
  • 2,296
  • 2
  • 16
  • 23