looking for text of an element or source of current page

Question

I am doing the following in selenium 2/webdrive using python and firefox...

I am opening some web pages that I need to check for a specific string - which, if present, means it is a good page to parse.

The phrase I am looking for is an h2 element similar to this:

<h2 class="page_title">Worlds Of Fantasy : Medieval House</h2>

If that h2 is missing, I know I don't need to work on it, just return and get the next in line.

In the code I have a try/exception/else block that looks for the phrase, if it sees it it passes on to the next part of the function. If not, it should go to the else, which tells it to return.

There are 2 pages called in this test - the first has the phrase, the second does not.

The first page is opened, and passes the test.

The second page is opened, and I get an exception report - but it never returns to the calling code in main...it just stops.

Why isn't the exception fallowing the proper path to return?

Here is the code:

    #!/usr/bin/env python

from selenium import webdriver
from selenium.webdriver import Firefox as Browser
from selenium.webdriver.support.ui import WebDriverWait


browser = webdriver.Firefox()

def call_productpage(productlink):
    global browser

    print 'in call_productpage(' + productlink + ')'
    browser.get(productlink)
    browser.implicitly_wait(8)

    #start block with <div class="page_content"> 
    product_block = browser.find_element_by_xpath("//div[@class='page_content']");

    # <h2 class="page_title">Worlds Of Fantasy : Medieval House</h2>
    try:
        product_name = product_block.find_element_by_xpath("//h2[@class='page_title']");
    except Exception, err:
        #print "Failed!\nError (%s): %s" % (err.__class__.__name__, err)
        print 'return to main()'
        return 0
    else:
        nameStr = str(product_name.text)
        print 'product_name:' + nameStr
    finally:
        print "test over!"
        return 1

test1 = call_productpage('https://www.daz3d.com/i/3d-models/-/desk-clocks?spmeta=ov&item=12657')
if test1:
    print '\ntest 1 went OK\n'
else:
    print '\ntest 1 did NOT go OK\n'

tes2 = call_productpage('https://www.daz3d.com/i/3d-models/-/dierdre-character-pack?spmeta=ov&item=397')
if test2:
    print '\ntest 2 went OK\n'
else:
    print '\ntest 2 did NOT go OK\n'

And here is a screenshot of the console showing the exception I get:

enter image description here

One other option I thought about using was to get the source of the page from the webdriver and do a find to see if the tag was there - but apparently there is no easy way to do THAT in webdriver!

Is this for testing purpose ? Is there any reason of avoiding python testing modules like `pytest` or `unittest`? I had similar situation, and selenium wasnt running all testcases unless they all passed. With pytest i could do that.besides there other benefites too. — 0xc0de, Jan 28 '12 at 06:28
Actually, no - his is not for testing. I am using Webdriver to revise all my custom written web scrapers - that keep breaking because the web site keeps changing the HTML...sometimes I think they do it just to bug me. Webdriver should be easier to keep current since it is more modular in nature, and I don't have to wade through as much code to find the problem. Plus, one of my clients can't get her network to access the web with the mechanize code I have been using. Hopefully she will b able to use this version. — Stephen, Feb 15 '12 at 06:16

score 1 · Answer 1 · answered Jan 28 '12 at 20:10

That's the solution! Thanks!

Here is the final code, cleaned up a bit to make the result more readable:

    #!/usr/bin/env python

from selenium import webdriver
from selenium.webdriver import Firefox as Browser
from selenium.webdriver.support.ui import WebDriverWait

browser = webdriver.Firefox()

def call_productpage(productlink):
    global browser

    print 'in call_productpage(' + productlink + ')'
    browser.get(productlink)
    browser.implicitly_wait(1)
    product_block = ''
    try:
        product_block = browser.find_element_by_xpath("//div[@class='page_content']");
    except:
        print 'this is NOT a good page - drop it'
        return 0
    else:
        textStr = str(product_block.text)
        #print 'page_content:' + str(textStr)
        print '\nthis is a good page - proceed\n'

    print 'made it past the exception!\n'

    product_name = product_block.find_element_by_xpath("//h2[@class='page_title']");
    nameStr = str(product_name.text)
    print '>>> product_name:' + nameStr + '\n'
    print "test over!"
    return 1

test1 = call_productpage('https://www.daz3d.com/i/3d-models/-/desk-clocks?spmeta=ov&item=12657')
print '\nTest #1:\n============\n'
if test1:
    print '\ntest 1 returned true\n'
else:
    print '\ntest 1 returned false\n'

print '\nTest #2:\n============\n'
test2 = call_productpage('https://www.daz3d.com/i/3d-models/-/dierdre-character-pack?spmeta=ov&item=397')
if test2:
    print '\ntest 2 returned true\n'
else:
    print '\ntest 2 returned false\n'
print '\n============\n'

And that works just as I need it to.

Again, thanks.

Misha Akovantsev · Accepted Answer · 2012-01-28T15:33:13.350

If you don't know which exception to expect, use empty except and traceback:

import traceback

try:
    int('string')
except:
    traceback.print_exc()
    print "returning 0"

# will print out an exception and execute everything in the 'except' clause:
# Traceback (most recent call last):
#   File "<stdin>", line 2, in <module>
# ValueError: invalid literal for int() with base 10: 'string'
# returning 0

But from the stack trace you already do know the exact exception name, so use it instead:

from selenium.webdriver.exceptions import NoSuchElementException

try:
    #...
except NoSuchElementException, err:
    #...

UPDATE:

You just get an exception before the try ... except, here:

product_block = browser.find_element_by_xpath("//div[@class='page_content']");

and not here:

product_name = product_block.find_element_by_xpath("//h2[@class='page_title']");

The traceback gave me the same result. Adding the import gave me a new error: No Module named exceptions... — Stephen, Jan 28 '12 at 06:54

looking for text of an element or source of current page

2 Answers2

Linked