1

Consider this simple example

library(xml2)

x <- read_xml("<body>
  <p>Some <b>text</b>.</p>
  <p>Some <b>other</b> <b>text</b>.</p>
  <p>No bold here!</p>
</body>")

Now, I want to find all the parents of the nodes containing the string other To do so, I run

> xml_find_all(x, "//b[contains(.,'other')]//parent::*")

{xml_nodeset (2)}
[1] <p>Some <b>other</b> <b>text</b>.</p>
[2] <b>other</b>

I do not understand why I also get the <b>other</b> element as well. In my view there is only one parent, which is the first node.

Is this a bug?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

1 Answers1

2

Change

//b[contains(.,'other')]//parent::*

which selects descendant-or-self (and you don't want self) and parent, to

//b[contains(.,'other')]/parent::*

which selects purely along parent, to eliminate <b>other</b> from the selection.

Or, better yet, use this XPath:

//p[b[contains(.,'other')]]

if you want to select all p elements with a b child whose string-value contains an "other" substring, or

//p[b = 'other']

if b's string-value is supposed to equal other. See also What does contains() do in XPath?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • ohhh! thanks! so you say that using `//` in `//b[contains(.,'other')]//parent::*` goes two levels down which is why the self is selected (you are the parent of your child)? – ℕʘʘḆḽḘ Feb 08 '18 at 16:25
  • 1
    No, there are no elements below `b`. `//` is *descendant-or-self*; and you don't want *self* -- you just want `/parent::*` without *self*. Note that the later two XPath suggestions are simpler. – kjhughes Feb 08 '18 at 16:41
  • Thanks again. I did not know what `p[b = 'other']` woudl directly look for the childs of `p`. I thought the `[` operator was to look for attributes or the current node – ℕʘʘḆḽḘ Feb 08 '18 at 16:48
  • can I ask just a follow up? Assume I am interested in the parents that, say, have some characteristics, such as containing the string 'some'. How would you do that? – ℕʘʘḆḽḘ Feb 08 '18 at 17:22
  • 1
    It'd be easier if you asked a new question and gave specific XML and specific targets for selection. Do watch out for the difference between [***testing text() nodes vs string values in XPath***](https://stackoverflow.com/q/34593753/290085). – kjhughes Feb 08 '18 at 17:30