How to scrape the text 64076 from Item model number using xpath expression

Question

I'm attempting to scrape the text 64076 next to Item model number: on this page using the following XPath expression:

//*[contains (@id,'productDetails')]//tr[contains(.,'Item model number')]/td|//*[contains (@id,'detail')]//descendant::li[contains(.,'Item model number')]/text() // I'm focusing mainly on second half of expression..

However, although this matches the expected text (64076) in Firebug it is not found when using Selenium WebDriver (Java).

When I change the XPath to:

//*[contains (@id,'productDetails')]//tr[contains(.,'Item model number')]/td|//*[contains (@id,'detail')]//descendant::li[contains(.,'Item model number')]

It works however it also scrapes the text Item model number: which I do not want (I know I could parse the result using regex but I'm trying to understand why my XPath is not working since I am clearly matching the actual text/number via text(), not the bold text)

Thanks

Possible duplicate of [using XPath: how to exclude text in nested elements](https://stackoverflow.com/questions/18218264/using-xpath-how-to-exclude-text-in-nested-elements) — shmosel, Sep 17 '18 at 01:51

score 0 · Answer 1 · answered Sep 17 '18 at 02:19

It's because text() in XPath means to find TextNode, but for Selenium only support to find and return ElementNode. Also Attribute Node not supported by Selenium, but support in XPath.

You have to find the parent(which is an ElementNode) of the TextNode, then use regex or split to extract you wanted sting.

String xpath = "//ul/li[b[text()='Item model number:']][contains(. , '64076')]"
driver.findElement(By.xpath(xpath)).getText().split()[1]

score 0 · Answer 2 · answered Sep 17 '18 at 02:24

This is a common problem in selenium since it only supports XPath 1.0 which does not include text(). The usual approach is to get the node and call getText().

Here is a nicely wrapped function to get the text without any text from the children:

public static String geNodeText(WebElement element) {
  String text = element.getText();
  for (WebElement child : element.findElements(By.xpath("./*"))) {
    text = text.replaceFirst(child.getText(), "");
  }
  return text;
}

Sure enough, you can use string functions or regex to extract the string in question as well. But this probably requires you to write custom extraction logic for each case.

score 0 · Answer 3 · answered Sep 17 '18 at 02:31

You cannot use Selenium to get it directly because it is TextNode. You may use JavaScript to check the text node and get it.

WebElement itemModelRootNode = driver.findElement(by.xpath("//*[contains (@id,'productDetails')]//tr[contains(.,'Item model number')]/td|//*[contains (@id,'detail')]//descendant::li[contains(.,'Item model number')]");

String script = "var t = ''; arguments[0].childNodes.forEach((node)=>{ if(node.nodeType==Node.TEXT_NODE && node.textContent.trim().length > 0) { t = node.textContent.trim(); } }); return t;"

String text = ((JavascriptExecutor)driver).executeScript(script, itemModelRootNode);

score 0 · Answer 4 · answered Sep 17 '18 at 05:02

More in @Bauban Answer. Selenium doesn't allow to locate an element using text node. You can try with evaluate() method of JavaScript and evaluate your xpath using JavascriptExecutor

This is your xpath :

//div[@class='content']//li[contains(.,'Item model number:')]/text()

And this is how you can evaluate:

JavascriptExecutor js = (JavascriptExecutor)driver;
Object message = js.executeScript("var value = document.evaluate(\"//div[@class='content']//li[contains(.,'Item model number:')]/text()\",document, null, XPathResult.STRING_TYPE, null ); return value.stringValue;");
System.out.println(message.toString().trim());

You can refer this link to get more details about evaluate function.

score 0 · Answer 5 · answered Sep 17 '18 at 06:52

As per the url you have shared to extract the text 64076 next to Item model number: on this page as it is a Text Node you need to use WebDriverWait for the desired element to be visible and you can use the following solution:

Code Block:

import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;

public class q52359631_textExtract {

    public static void main(String[] args) {
        System.setProperty("webdriver.gecko.driver", "C:\\Utility\\BrowserDrivers\\geckodriver.exe");
        WebDriver driver = new FirefoxDriver();
        driver.get("https://www.amazon.com/dp/B000TW3B9G/?tag=stackoverflow17-20");
        WebElement myElement = new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//td[@class='bucket']//li/b[contains(.,'Item model number:')]/..")));
        String myText = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].lastChild.textContent;", myElement);
        System.out.println(myText);
    }
}

Console Output:
```
 64076
```

Yash · Answer 6 · 2018-09-17T07:49:47.867

Try for Item model number: 64076 for the test URL

var xpathExp = 
    "//h2[.='Product details']//parent::td//div[@class='content']/ul/li/b[contains(text(),'Item')]/parent::li/text()";
var ele = $x(xpathExp);
console.dir( ele ); // Array(1)
console.log( ele[0] ); //" 64076"

Test XML XPath online:

<ul>
  <li>
    <b>Item model number:</b> 64076
  </li>
</ul>

XML Tree View ^codebeautify //ul/li/b[contains(text(),'Item')]/parent::li/text()

ul ..
li 64076 ..
b  Item model number:

html as javascript object

outerHTML:"<li><b>Item model number:</b> 64076</li>"
outerText:"Item model number: 64076"

tagName:"LI"
textContent:"Item model number: 64076"

lastChild:text
    data: 64076"
    nodeValue: 64076"
    textContent: 64076"
    wholeText: 64076"
lastElementChild:b

How to scrape the text 64076 from Item model number using xpath expression

6 Answers6