2

I want to find the element that contains the target text itself or in any of their children.

Sample data:

library(magrittr)
library(xml2)
library(rvest)    
html <- "<button><span><span>as</span></span></button><button><p>ds</p></button><input><span><span>as</span></span><input>"
doc <- html %>% read_html()
doc %>% html_nodes(xpath = "//*[self::button and //*[contains(text(), 'as')]]")

Please consider that my original data is more complex, i check for 10+ strings that could be within the target Elements. Therefore, i would prefer using "//*[self::button or self::Input]" instead of "//button",... Moreover, the target text could be within the target element (button or Input) itself or in any of the children.

Desired Output:

First button and the input

What i tried:

doc %>% html_nodes(xpath = "//*[(self::button or self::input) and //*[contains(text(), 'as')]]")
doc %>% html_nodes(xpath = "//*[(self::button or self::input)]//*[contains(text(), 'as')]")

see How do I select child elements of any depth using XPath?

Tlatwork
  • 1,445
  • 12
  • 35
  • Can you please share the output you are getting from these commands as well as the output you would like to be getting (ie what's wrong with these versions specifically)? – MrFlick Mar 04 '21 at 19:08
  • 1
    I think the problem is that the HTML tag `` is a void element. It cannot contain child elements. Basically that's invalid syntax. You can see the structure with `html_structure(doc)`. The `input` node does not "contain" the "as" text. The parser moves the span into a sibling node, not a child node. – MrFlick Mar 04 '21 at 19:21
  • thanks my example is bad, let me find a better one. – Tlatwork Mar 04 '21 at 19:24
  • actually i think your info actually solved it. Thanks! – Tlatwork Mar 04 '21 at 20:10

1 Answers1

1

The problem with your expressions is that they use an absolute location path instead of a relative location path. An example that preserves your style:

//*[(self::button or self::input) and .//*[contains(text(), 'as')]]

More in the XPath style:

//*[self::button|self::input][.//text()[contains(.,'as')]]

Test it here

Alejandro
  • 1,882
  • 6
  • 13