0

I want to select all the elements on the page containing any text.
Only elements actually containing texts themselves, not the parent elements containing texts in their child elements only.
This XPath is matching elements containing any non-empty texts

//*[text() != ""]

However this

List<WebElement> list = driver.findElements(By.xpath("//*[text() != '']"));

gives me a list of all elements containing texts themselves or in their child elements.
I can iterate over this list with something like this to get elements actually containing texts themselves into real list

List<WebElement> real = new ArrayList<>();
for(WebElement element : list){
    js = (JavascriptExecutor)driver;
    String text = js.executeScript("""
    return jQuery(arguments[0]).contents().filter(function() {
        return this.nodeType == Node.TEXT_NODE;
    }).text();
    """, element);
    if(text.length()>0){
        real.add(element);
}

But this is a kind of workaround.
I'm wondering is there a way to get the list of elements actually containing any text doing that directly or more elegantly?

Prophet
  • 32,350
  • 22
  • 54
  • 79

4 Answers4

1
    List<WebElement> elementsWithOwnText = new ArrayList<WebElement>();
    List<WebElement> allElements = driver.findElements(By.xpath("//*"));
    for (WebElement element: allElements) {
        List<WebElement> childElements = element.findElements(By.xpath(".//*"));
        String text = element.getText();
        if (childElements.size() == 0 && text.lenght() > 0) {
            elementsWithOwnText.add(element);
        }
    }

Be aware of org.openqa.selenium.StaleElementReferenceException. While looping allElements any of them may be no more attached to the page document (dynamic content f.e.).

pburgr
  • 1,722
  • 1
  • 11
  • 26
  • I'm not sure your solution is better than mine. Maybe even worse. You are getting ALL the elements on the entire page and then checking children elements for all of them while I started with elements containing texts only... – Prophet Jul 27 '21 at 12:50
  • The only difference is with checking mechanism to validate the element itself contains text. not it's children. – Prophet Jul 27 '21 at 12:51
  • Also I asked for direct way to get all those elements, if it is possible. Not for algorithm to remove parent elements. – Prophet Jul 27 '21 at 12:58
  • I' am sorry, you need to get the text of parents only or children only? – CCC Jul 27 '21 at 13:44
  • I want to get all the elements containing texts. Elements themselves. Only the elements actually containing texts, not those who actually have texts in their child elements only – Prophet Jul 27 '21 at 14:13
1

You can try this: it selects all leaf elements with text.

List<WebElement> list = driver.findElements(By.xpath("//*[not(child::*) and text()]"));
        for (WebElement webElement : list)
            System.out.println(webElement.getText());
CCC
  • 170
  • 1
  • 15
  • Looks like this is what I looked for. The XPath itself, without the internal `for` loop. – Prophet Jul 27 '21 at 14:16
  • on a second thought it may be wrong. This xpath selects leaf elements with text but you could have some parent elements with actual text. If i am not wrong you want all the elements with actual text. – CCC Jul 27 '21 at 14:20
  • Right.... Well, can you think about such solution? Will leave the acceptance for now, waiting for a correct solution. – Prophet Jul 27 '21 at 14:22
  • Sorry but it seems undoable. I tried `innertHTML`, `innerTEXT`, `outerHTML`, `outerTEXT`....i tried replacing tags etc... Also you may have some html code like this: `

    this is a test

    ` or `

    this is
    another example

    ` How can you decide what's the plain text? I reckon that you should get the texts and work with them. Selenium doesn't even support `//text()` xpath since it returns a text while Selenium needs a WebElement to be returned... Sorry :(
    – CCC Jul 27 '21 at 15:16
  • I understand. It was quite predictable. I saw similar questions but no solution I asked for. So I'm sorry, but I removed the acceptation. Will upvote you on some other place instead :) – Prophet Jul 27 '21 at 15:19
  • don't worry, i really hope someone finds a solution since it's very intersting. – CCC Jul 27 '21 at 15:20
  • Me too. Not sure it's possible without looping like in my question or in pburgr answer – Prophet Jul 27 '21 at 15:24
  • maybe this will help: https://stackoverflow.com/questions/28945692/how-to-get-text-from-parent-element-and-exclude-text-from-children-c-selenium – CCC Jul 27 '21 at 15:54
1

Until you find the xpath that you need, as a temporary solution, I would recommand to try the below iteration too (even though is not so efficient as a direct xpath).

In my case it took 1 minute to evaluate 700 nodes with text and returned 152 elements that have its own text:

public static List<WebElement> getElementsWithText(WebDriver driver) {
    return driver.findElements(By.xpath("//*[normalize-space() != '']"))
            .stream().filter(element -> doesParentHaveText(element))
            .collect(Collectors.toList());
}

private static boolean doesParentHaveText(WebElement element) {
    try {
        String text = element.getText().trim();
        List<WebElement> children = element.findElements(By.xpath("./*"));

        for (WebElement child: children) {
            text = text.replace(child.getText(), "").trim();
        }

        return text.trim().replace("[\\n|\\t|\\r]", "").length() > 0;
    } catch (WebDriverException e) {
        return false; //in case something does wrong on reading text; you can change the return false with thrown error
    }
}
doris
  • 36
  • 2
  • Thanks for your answer. This is still something similar to what I mentioned in the question and what answered pburgr, but still thanks for additional approach. – Prophet Jul 28 '21 at 09:00
0

this could help: source

List<String> elements = driver.findElements(By.xpath("//a")).stream().map(productWebElement -> productWebElement.getText()).distinct().collect(Collectors.toList());
        
    // Print count of product found
    System.out.println("Total unique product found : " + elements.size());
        
    // Printing product names
    System.out.println("All product names are : ");
    elements.forEach(name -> System.out.println(name));
CCC
  • 170
  • 1
  • 15