3

I'm wondering if there is any way to easily retrieve text that is sandwiched between two child elements with text? In this particular case, I'm looking to extract the text USD.

<div class="indemandProgress-raised ng-binding">
    <span class="indemandProgress-raisedAmount ng-binding" gogo-test="raised">
        $6,811,034
    </span>
    USD
    <span class="ng-binding">
        total funds raised
    </span>
</div>

Actual Format of Code in Browser

<div class="indemandProgress-raised ng-binding">
<span class="indemandProgress-raisedAmount ng-binding" gogo-test="raised">$6,811,034</span> USD <span class="ng-binding">total funds raised</span>
</div>

Is this possible with XPATH alone or would I have to extract all of the text and then parse it?

It has to work with Selenium.

oldboy
  • 5,729
  • 6
  • 38
  • 86

3 Answers3

1

You've already accepted answer, but note that text.split()[1] is quite unreliable solution and it might not be applicable in other (in most) cases. For instance, if first text node contains spaces

$ 6,811,034

You can try this solution:

element = browser.find_element_by_class_name('indemandProgress-raisedAmount')
result = browser.execute_script('return arguments[0].childNodes[2].textContent;', element).strip()

Note that div has following 5 child nodes:

  1. Empty string (index 0)
  2. span node (index 1)
  3. Text node "USD" (index 2)
  4. Another span (index 3)
  5. Another empty string (index 4)

You need to get text content of third child node and childNodes[2].textContent allows you to do that

Andersson
  • 51,635
  • 17
  • 77
  • 129
  • def good to know, but out of hundreds of cases `browser.find_element_by_xpath(...).text` always returns output akin to `$107,866 USD total funds raised`. – oldboy Jul 08 '18 at 19:55
  • in your example, `result = browser.execute_script('return arguments[0].childNodes[2].textContent;', element).strip()`, is `element` passed in as `arguments`? – oldboy Jul 08 '18 at 19:57
  • Yep. `arguments[0] == element`. It's just a simplified syntax for `browser.execute_script('return document.querySelector(".indemandProgress-raisedAmount").childNodes[2].textContent;').strip()` – Andersson Jul 08 '18 at 19:58
  • ok interesting. thats pretty cool how u can select an element with selenium, store it as a variable, and then pass it to javascript as the variable – oldboy Jul 08 '18 at 20:00
0

Try it like using xpath 2.0+:

//div[@class="indemandProgress-raised ng-binding"]/text()

Test Demo


In Selenium, you cannot use XPath that returns Attributes or Text nodes, since only Nodes are supported.

To get the text you want you can use Javascript to extract it from the Text Node. Or select the node and then use .text

result = browser.find_element_by_xpath('//div[contains(@class, "indemandProgress-raisedAmount")]').text.split()[1]

So, ultimately, it is not possible using XPath /text() in Selenium and you have to rely on alternatives methods as outlined.

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • 1
    already tried it. doesn't work with selenium. i guess i should've added that to the question. myb. if you're interested: `WebDriverException: Message: TypeError: Expected an element or WindowProxy, got: [object Text] {}` – oldboy Jul 07 '18 at 23:39
  • you can select Text nodes with `.text`... anyways, the JS is useless since i can simply use `x = browser.find_element_by_xpath(...).text.split()[1]`. – oldboy Jul 08 '18 at 00:01
  • "it is not possible using XPath /text() in Selenium" – wp78de Jul 08 '18 at 00:17
  • so it's definitively impossible with xpath in a selenium environment? – oldboy Jul 08 '18 at 00:33
  • According to my (limited) knowledge, unfortunately, that's the case. Here is an [answer that states the same](https://stackoverflow.com/a/48706495/8291949). – wp78de Jul 08 '18 at 00:43
0

You can't do it with XPath alone but you can use Javascript Executor and get the text node. You didn't specify a language so here's a method to do this in C#:

/// <summary>
/// Returns the text of the specified child text node.
/// </summary>
/// <param name="parentElement">The parent <see cref="IWebElement"/> of the desired text node.</param>
/// <param name="index">The index of the childNode collection relative to parentElement</param>
/// <returns>The text of the specified child text node.</returns>
public string GetChildTextNode(IWebElement parentElement, int index = 0)
{
    string s = (string)((IJavaScriptExecutor)driver).ExecuteScript("return arguments[0].childNodes[arguments[1]].textContent;", parentElement, index);
    return s.Trim();
}

In this case you would call it like

IWebElement e = Driver.FindElement(By.CssSelector("div.indemandProgress-raised"));
string s = GetChildTextNode(e, 2);
JeffC
  • 22,180
  • 5
  • 32
  • 55