9

I'm trying to use Selenium (in Python) to extract some information from a website. I've been selecting elements with XPaths but am having trouble using the following-sibling selector. The HTML is as follows:

<span class="metadata">
    <strong>Photographer's Name: </strong>
    Ansel Adams
</span>

I can select "Photographer's Name" with

In [172]: metaData = driver.find_element_by_class_name('metadata')

In [173]: metaData.find_element_by_xpath('strong').text
Out[173]: u"Photographer's Name:"

I'm trying to select the section of text after the tag ('Ansel Adams' in the example). I assumed I could use the following-sibling selector but I receive the following error:

In [174]: metaData.find_element_by_xpath('strong/following-sibling::text()')
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (328, 0))
... [NOTE: Omitted the traceback for brevity] ...
InvalidSelectiorException: Message: u'The given selector strong/following-sibling::text() is either invalid or does not result in a WebElement. The following error occurred:\n[InvalidSelectorError] The result of the xpath expression "strong/following-sibling::text()" is: [object Text]. It should be an element.' 

Any ideas as to why this isn't working?

alukach
  • 5,921
  • 3
  • 39
  • 40

3 Answers3

8

@RossPatterson is correct. The trouble is that the text 'Ansel Adams' is not a WebElement, so you cannot use find_element or find_elements. If you change your HTML to

<span class="metadata">
    <strong>Photographer's Name: </strong>
    <strong>Ansel Adams</strong>
</span>

then find_element_by_xpath('strong/following-sibling::*[1]').text returns 'Ansel Adams'.

shamp00
  • 11,106
  • 4
  • 38
  • 81
  • 1
    Unfortunately, I don't have control over the HTML content. It's strange though, since the code works in online [XPath testers]. Well, this leads me to a second question: is it possible to get all of the contents of ` – alukach Jan 20 '12 at 03:19
  • You could always use `driver.page_source` to get the HTML of the whole page, and then use [something other than webdriver to parse it](http://stackoverflow.com/questions/8692/how-to-use-xpath-in-python). – shamp00 Jan 20 '12 at 10:43
  • Great, I didn't know about `driver.page_source`, this makes my day, thanks! – alukach Jan 21 '12 at 08:53
3

This is documented in this Selenium bug report: http://code.google.com/p/selenium/issues/detail?id=5459

"Your xpath doesn't return an element; it returns a text node. While this might have been perfectly acceptable in Selenium RC (and by extension, Selenium IDE), the methods on the WebDriver WebElement interface require an element object, not just any DOM node object. WebDriver is working as intended. To fix the issue, you'd need to change the HTML markup to wrap the text node inside an element, like a ."

user2707671
  • 1,694
  • 13
  • 12
  • Unfortunately it's hard to find actual documentation that documents the intention that "the methods on the WebDriver WebElement interface require an element object, not just any DOM node object," contrary to the case with Selenium RC. I finally found something here: http://seleniumhq.github.io/selenium/docs/api/java/org/openqa/selenium/WebElement.html WebElement, the type returned by findElement, "Represents an HTML element". – LarsH Oct 02 '15 at 14:52
2

To get the text "Ansel Adams", just use metaData.get_text(). I don't believe find_element_by_* will allow you to find a text node.

Ross Patterson
  • 9,527
  • 33
  • 48
  • Seems like `metaData.get_text()` would give you `Photographer's Name: Ansel Adams`. According to the documentation at http://release.seleniumhq.org/selenium-remote-control/0.9.2/doc/dotnet/Selenium.ISelenium.GetText.html, "This command uses either the textContent (Mozilla-like browsers) or the innerText (IE-like browsers) of the element, which is the rendered text shown to the user." – LarsH Oct 02 '15 at 14:35