How to get text from parent element and exclude text from children (C# Selenium)

Question

Is it possible to get the text only from a parent element and not its children in Selenium?

Example: Suppose I have the following code:

<div class="linksSection>
  <a href="https://www.google.com/" id="google">Google Link
    <span class="helpText">This link will take you to Google's home page.</span>
  </a>
  ...
</div>

In C# (or whatever language), I will have:

string linktext = driver.FindElement(By.CssSelector(".linksSection > a#google")).Text;
Assert.AreEqual(linkText, "Google Link", "Google Link fails text test.");

However, the linktext will have "Google LinkThis link will take you to Google's home page."

Without doing a bunch of string manipulation (such as getting the text of all the children and subtracting that from resultant text of the parent), is there a way to get just the text from a parent element?

Simple answer is: no, you have to do the string manipulation. A more involved answer would be a combination of JavaScript and XPath substring-before() method. http://zvon.org/comp/r/ref-XPath_2.html#Functions~substring-before But that is still string manipulation, just at a different level. — SiKing, Mar 09 '15 at 16:42
Thanks! My current solution is to do a simple "string.Contains()" verification. But that may be error prone in some situations. — Machtyn, Mar 09 '15 at 17:31
possible duplicate of [How to get text of an element in Selenium WebDriver (via the Python api) without including child element text?](http://stackoverflow.com/questions/12325454/how-to-get-text-of-an-element-in-selenium-webdriver-via-the-python-api-without) — Louis, Mar 10 '15 at 09:49

score 16 · Accepted Answer · edited Nov 27 '17 at 15:36

16

This is a common problem in selenium since you cannot directly access text nodes - in other words, your XPath expressions and CSS selectors have to point to an actual element.

Here is the list of possible solutions for your problem:

get the parent element's text, for each child, get the text and remove it from the parent's text. What you would have left is the desired text - Google Link in your case.
if you want to get the Google Link just to make an assertion, it could be that you would be okay with checking if the parent's text starts with Google Link. See StringAssert.StartsWith().

get the outerHTML of the parent's text and feed to an HTML Parser, like Html Agility Pack. Something along these lines:

string outerHTML = driver.FindElement(By.CssSelector(".linksSection > a#google")).GetAttribute("outerHTML");

HtmlDocument html = new HtmlDocument();
html.LoadHtml(outerHTML);

HtmlAgilityPack.HtmlNode a = html.DocumentNode.SelectNodes("//a[@id='google']");
HtmlNode text = strong.SelectSingleNode("following-sibling::text()");

Console.WriteLine(text.InnerText.Trim());

edited Nov 27 '17 at 15:36

carla

1,970
1
31
44

answered Mar 09 '15 at 17:49

alecxe

462,703
120
1,088
1,195

How can we do it in Selenium Java? I'm trying to find it out, but couldn't find LoadHtml function there... can anyone help me here? – zeal Jun 28 '16 at 12:16
@zeal LoadHtml is part of Html Agility Pack, which is only available for C# as far as I am aware. – mrGreenBrown Nov 29 '18 at 06:56
@alecxe Thank you for pointing me in the right direction, you saved me a ton of time! I also noticed that HtmlAgilityPack has GetDirectInnerText method which does the trick. – user1106591 Jun 01 '20 at 01:24
AgilityPack only works if the source code is complete, like any web scrapper (ex: beautifulsoup) it does not work if the source code is truncated. – NewBie1234 Feb 16 '21 at 15:46

score 0 · Answer 2 · edited Jun 23 '23 at 14:30

there is three ways to do the job.

replace the unwanted text from child node with '''', the logic is just like the other answers.
use js :

    private static String OWN_TEXT_JS = "arr=[];content=document.querySelector(arguments[0]);for(i=0,len=content.childNodes.length;i<len;i++){if(content.childNodes[i].nodeType===3){arr.push(content.childNodes[i].nodeValue);}}str=arr.join(\"\"); return str;";
    Object result = ((JavascriptExecutor) WebDriver).executeScript(OWN_TEXT_JS, path);
    if (!(result instanceof String))
       return null;
    else 
       return (String) result;

use html parser, in java it's jsoup.

    element= webDriver.findElement...
    String result = Jsoup.parse(element.getAttribute("outerHTML")).selectFirst(element.getTagName()).ownText();

    WebElement webElement = webDriver.findElement(By.xpath("/html"));
    Jsoup.parse(webElement.getAttribute("outerHTML")).selectFirst(csspath).ownText();

How to get text from parent element and exclude text from children (C# Selenium)

2 Answers2

Linked

Related