0

I have a problem with PhantomJS, which can hang in a loop without reporting any error. I know my code is good because after restarting it normally completes and maybe hangs somewhere later. What I have in mind is maybe something like this:

i = 0
while i < len(url_list):
    try:
        driver.get(url_list[i])
        # do whatever needs to be done
        i = i+1
        # go on the next one
    except ThisIterationTakesTooLong:
        # try again for this one because the code is definitely good
        continue

Is it even possible to do something like this? Basically, it's a thing in the background that checks how long the loop is running. I know about time.time(), but the problem with that is it won't even measure if it hangs on a command before the counter.


EDIT
After looking at the suggested question, I still have the problem because that signal module doesn't work as it should.

import signal
signal.alarm(5)

This throws "AttributeError: 'module' object has no attribute 'alarm'"
So it looks like I can't really use this.

DoctorEvil
  • 453
  • 3
  • 6
  • 18
  • 2
    Possible duplicate of [break the function after certain time](https://stackoverflow.com/questions/25027122/break-the-function-after-certain-time) – Green Cloak Guy Jun 12 '18 at 12:10
  • Generally, for libraries that retrieve data over networks, the interface for doing so has some kind of `timeout` argument, or similar, that causes the call to return after some amount of time, whether it succeeded or not. Check the phantomjs documentation to see if your driver object supports this. – Kevin Jun 12 '18 at 12:11
  • @a625993 yes that looks like what I would need. – DoctorEvil Jun 12 '18 at 12:22
  • @Kevin there is a TimeoutException in Selenium, but as far as I know Selenium returns that by itself if something happens. In my code, it happens that the program just does nothing for 5 minutes or so when it should do the job in 5 seconds, so I think that exception won't get called for some reason. – DoctorEvil Jun 12 '18 at 12:24
  • Wait, are you using Selenium or PhantomJS? I'm confused. Can you provide a [mcve], complete with import statements? – Kevin Jun 12 '18 at 12:26
  • Both. Selenium is a module for Python and PhantomJS is a headless browser. I know about things like waiting for elements, exceptions that Selenium throws and such, but this can't help me here since code runs well 99.9% of the time and them for some reason it just halts without any error or anything. It's definitely not waiting for too long, I'm sure. – DoctorEvil Jun 12 '18 at 13:14

1 Answers1

1

I've run into this kind of thing before and, unfortunately, there's no pretty way around it. The fact is, sometimes pages/elements just won't load, and you have to make a choice about it. I usually end up doing something like this:

from selenium.common.exceptions import TimeoutException

# How long to wait for page before timeout
driver.set_page_load_timeout(10)

def wait_for_url(driver, url, max_attempts):
    """Make multiple attempts to load page
    according to page load timeout, and
    max_attempts."""

    attempts = 0

    while attempts < max_attempts:

        try:
            driver.get(url)
            return True

        except TimeoutException:
            # Prepare for another attempt
            attempts += 1

            if attempts == 10:
                # Bail on max_attempts
                return False

# We'll use this if we find any urls that won't load
# so we can process later. 
revisit = []

for url in url_list:

    # Make 10 attempts before giving up.
    url_is_loaded = wait_for_url(driver, url, 10)

    if url_is_loaded:
        # Do whatever

    else:
        revisit.append(url)

# Now we can try to process those unvisitied URLs. 

I would also add that the issue might be with PhantomJS. The most recent versions of selenium deprecate it. In my experience, PhantomJS is sluggish and prone to unexpected behavior. If you need headless, you can go with the very stable Chrome. If you're not familiar, that looks like:

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(path/to/chromedriver, chrome_options=chrome_options)

Maybe one of those suggestions will help.

T. Ray
  • 641
  • 4
  • 9
  • Thanks for the tip, I will try headless Chrome. I have tried regular Chrome before and Phantom was actually faster, that's why I used it. – DoctorEvil Jun 12 '18 at 18:49