0

I'm looking to implement a general localisation test using webdriver and my idea is:

  1. Select all elements that contain text
  2. Use a java language identification library to verify that the text is in some specific language

I looked up various ways to select elements using text but everything documented seems to show ways of locating using specific text using contains() or text()

I thought the following would work:

//*[contains(text(), '')]

but that selects everything whether it has text or not. It also selects elements in the header. I want to select all visible text on the page, extract that text and pass it through the language identification library element by element.

amadain
  • 2,724
  • 4
  • 37
  • 58

1 Answers1

1

You can use this XPath

//*[text() != ""]

This will give you all the elements with non-empty text.
So that

List<WebElement> list = driver.findElements(By.xpath("//*[text() != '']"));

will give you a list of all web elements on the page containing texts.
UPD
If you wish to get only elements containing texts themselves, you can exclude the elements containing texts in their children only by this code:

List<WebElement> real = new ArrayList<>();
for(WebElement element : list){
    js = (JavascriptExecutor)driver;
    String text = js.executeScript("""
    return jQuery(arguments[0]).contents().filter(function() {
        return this.nodeType == Node.TEXT_NODE;
    }).text();
    """, element);
    if(text.length()>0){
        real.add(element);
}

The final list of elements will be in real list.
The idea is from here. Translated from Python to Java according to this syntax

Prophet
  • 32,350
  • 22
  • 54
  • 79
  • Its better than mine but from inspect it still hits the header, the title and all of the style elements in the header. Is there a way to limit to the body? However inspired by you I tried //body//*[text() != ""] and this worked – amadain Jul 27 '21 at 09:26
  • So, now you found what you looked for or still want to improve it to get only elements containing texts themselves, not containing texts in their child nodes? – Prophet Jul 27 '21 at 09:30
  • yes I would rather get the elements with the texts themselves – amadain Jul 27 '21 at 10:01
  • I didn't find better solution than this. Please see the updated answer – Prophet Jul 27 '21 at 11:44
  • I see, thanks. I already upvoted you, so can't do that anymore – Prophet Jul 27 '21 at 13:17