2

I would like to select all nodes, that have text in them.

In this example the outer shouldBeIgnored tag, should not be selected:

<shouldBeIgnored>
    <span>
        the outer Span should be selected
    </span>
</shouldBeIgnored>

Some other posts suggest something like this: //*/text().
However, this doesn't work in firefox.

This is a small UnitTest to reproduce the problem:

 public class XpathTest {
    final WebDriver webDriver   = new FirefoxDriver();

    @Test
    public void shouldNotSelectIgnoredTag() {

        this.webDriver.get("http://www.s2server.de/stackoverflow/11773593.html");

        System.out.println(this.webDriver.getPageSource());

        final List<WebElement> elements = this.webDriver.findElements(By.xpath("//*/text()"));

        for (final WebElement webElement : elements) {
            assertEquals("span", webElement.getTagName());
        }
    }

    @After
    public void tearDown() {
        this.webDriver.quit();
    }
 }
d0x
  • 11,040
  • 17
  • 69
  • 104

5 Answers5

7

If you want to select all nodes that contain text then you can use

//*[text()]

Above xpath will look for any element which contains text. Notice the text() function which is used to determine if current node has text or not.

In your case it will select <span> tag as it contains text.

Vaman Kulkarni
  • 3,411
  • 2
  • 21
  • 22
  • 1
    Sadly your expression will select also the `shouldBeIgnored` node, because its child has text. – d0x Aug 02 '12 at 09:54
  • 3
    @ChristianSchneider No it doesn't. You can validate it [here](http://www.mizar.dk/XPath/Default.aspx). I think it won't select because it this xpath is looking for text of current node and node `` itself does not have any text. – Vaman Kulkarni Aug 02 '12 at 10:04
1

You can call a javascript function, which shall return you text nodes:

function GetTextNodes(){    
var lastNodes = new Array();
    $("*").each(function(){
      if($(this).children().length == 0)
        lastNodes.push($(this));
    });
    return lastNodes;
}

Selenium WebDriver code:

IJavaScriptExecutor jscript = driver as IJavaScriptExecutor;
List<IWebElement> listTextNodes = jscript.ExecuteScript("return GetTextNodes();");

FYI: Something like might work for you.

iMatoria
  • 1,450
  • 2
  • 19
  • 35
1

I see no reason why this wouldn't work (by java)

text = driver.findElement(By.xpath("//span")).getText()

If in the odd case that doesnt work:

text = driver.findElement(By.xpath("//span")).getAttribute("innerHTML")
Greg
  • 5,422
  • 1
  • 27
  • 32
  • You are selecting all `span`s. I need to select all elements having text. That is is different requirement. If i select all objects and then test them with `.getText()` whehter they have text, the operation will be very slow because it will select hunderds of not needed webElements and afterwords for all this webElements I have to call .getText() – d0x Aug 02 '12 at 09:21
1

Finally i found out that there is no way to do it with xpath (because XPaths text() selects also the innerText of a node). As workaround i have to inject Java Script that returns all elements, selected by an XPath, that has some text.

Like this:

public class XpathTest
{
    //@formatter:off
    final static String JS_SCRIPT_GET_TEXT  =  "function trim(str) {                                                       " +                                                                                                                                             
                                               "    return str.replace(/^\s+|\s+$/g,'');            " +                                                                                                                                             
                                               "}                                                                          " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "function extractText(element) {                                            " +                                                                                                                                             
                                               "    var text = '';                                                         " +                                                                                                                                             
                                               "    for ( var i = 0; i < element.childNodes.length; i++) {                 " +                                                                                                                                             
                                               "        if (element.childNodes[i].nodeType === Node.TEXT_NODE) {           " +                                                                                                                                             
                                               "            nodeText = trim(element.childNodes[i].textContent);            " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "            if (nodeText) {                                                " +                                                                                                                                             
                                               "                text += element.childNodes[i].textContent + ' ';           " +                                                                                                                                             
                                               "            }                                                              " +                                                                                                                                             
                                               "        }                                                                  " +                                                                                                                                             
                                               "    }                                                                      " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "    return trim(text);                                                     " +                                                                                                                                             
                                               "}                                                                          " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "function selectElementsHavingTextByXPath(expression) {                     " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "    result = document.evaluate(\".\" + expression, document.body, null,    " +                                                                                                                                             
                                               "            XPathResult.ANY_TYPE, null);                                   " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "    var nodesWithText = new Array();                                       " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "    var node = result.iterateNext();                                       " +                                                                                                                                             
                                               "    while (node) {                                                         " +                                                                                                                                             
                                               "        if (extractText(node)) {                                           " +                                                                                                                                             
                                               "            nodesWithText.push(node)                                       " +                                                                                                                                             
                                               "        }                                                                  " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "        node = result.iterateNext();                                       " +                                                                                                                                             
                                               "    }                                                                      " +                                                                                                                                             
                                               "                                                                           " +                                                                                                                                             
                                               "    return nodesWithText;                                                  " +                                                                                                                                             
                                               "}                                                                          " +                                                                                                                                             
                                               "return selectElementsHavingTextByXPath(arguments[0]);";                                                                                                                                                                                    
    //@formatter:on

    final WebDriver     webDriver           = new FirefoxDriver();

    @Test
    public void shouldNotSelectIgnoredTag()
    {
        this.webDriver.get("http://www.s2server.de/stackoverflow/11773593.html");

        final List<WebElement> elements = (List<WebElement>) ((JavascriptExecutor) this.webDriver).executeScript(JS_SCRIPT_GET_TEXT, "//*");

        assertFalse(elements.isEmpty());

        for (final WebElement webElement : elements)
        {
            assertEquals("span", webElement.getTagName());
        }
    }

    @After
    public void tearDown()
    {
        this.webDriver.quit();
    }

}

I modified the UnitTest that the example testable.

d0x
  • 11,040
  • 17
  • 69
  • 104
1

One problem with locating text nodes is that even empty strings are considered as valid text nodes (e.g

<tag1><tag2/></tag1>

has no text nodes but

<tag1>  <tag2/>    </tag1> 

has 2 text nodes, one with 2 spaces and another with 4 spaces )

If you want only the text nodes that have non-empty text, here is one way to do it:

//text()[string-length(normalize-space(.))>0]

or to get their parent elements

//*[text()[string-length(normalize-space(.))>0]]
Dusko Delic
  • 121
  • 5
  • Returning text nodes is fine as far as XPath goes but if you pass a path to a Selenium method and the path resolves to *anything else* than elements, then Selenium will choke. – Louis Jan 09 '15 at 17:36