Only select text directly in node, not in child nodes

Question

How does one retrieve the text in a node without selecting the text in the children?

<div id="comment">
     <div class="title">Editor's Description</div>
     <div class="changed">Last updated: </div>
     <br class="clear">
     Lorem ipsum dolor sit amet.
</div>

In other words, I want Lorem ipsum dolor sit amet. rather than Editor's DescriptionLast updated: Lorem ipsum dolor sit amet.

score 52 · Accepted Answer · edited Jul 05 '16 at 12:19

52

In the provided XML document:

<div id="comment">
      <div class="title">Editor's Description</div>
      <div class="changed">Last updated: </div>
      <br class="clear">
      Lorem ipsum dolor sit amet. 
</div>

the top element /div has 4 children nodes that are text nodes. The first three of these four text-node children are whitespace-only. The last of these 4 text-node children is the one that is wanted.

Use:

/div/text()[last()]

This is different from:

/div/text()

The latter may (depending on whether whitespace-only nodes are preserved by the XML parser) select all 4 text nodes, but you only want the last of them.

An alternative is (when you don't know exactly which text-node you want):

/div/text()[normalize-space()]

This selects all text-node-children of /div that are not whitespace-only text nodes.

edited Jul 05 '16 at 12:19

Sofia

771
1
8
22

answered Dec 19 '10 at 17:03

Dimitre Novatchev

240,661
26
293
431

@Dimitre, the question is to select the text without *child* nodes, the first suggestion by you doesn't do this. – Lucero Dec 19 '10 at 17:08
@Lucero: Why? I haven't suggested the use of the `descendant::` axis or the `//` abbreviation. The first expression selects just one text node: the last child text node of `/div`. the alternative selects any child text node of `/div` that is not whitespace-only. – Dimitre Novatchev Dec 19 '10 at 17:14
5

@Dimitre, simply because nothing says that the wanted text will be the last node? – Lucero Dec 19 '10 at 17:16
@Lucero: I have edited my answer to make it more clear. Hope you understand it now. – Dimitre Novatchev Dec 19 '10 at 17:28
2

@Dimitre, the question was to get the text without the text of the child nodes. Getting the last text node only is working for the given sample, but not answering the question in general. – Lucero Dec 19 '10 at 17:28
@Lucero: I think that the edited answer meets your objections -- it explains the two alternatives one has: either know *exactly* which node you want to select, or select all text nodes that are not white-space only. Both expressions avoid selecting whitespace-only text nodes -- something that may happen using your suggested solution. Do note that the OP really wants only non-whitespace-only text nodes. – Dimitre Novatchev Dec 19 '10 at 17:33
1

@Dimitre, in fact the white space stripping was useful as well, thanks to both – Moak Dec 19 '10 at 17:38
1

I just don't get why both of the solutions don't work for me in Firefox with XPather, but `//div/text()[normalize-space() and parent::div[@id='comment']]` is fine. – István Ujj-Mészáros Dec 19 '10 at 17:45
@styu: Then you are evaluating the XPath expressions against a different XML document (not against the provided XML document) – Dimitre Novatchev Dec 19 '10 at 17:52
1

@Dimitre I think it's an issue with XPather. Your XPath Visualizer and an other one works fine, thanks. – István Ujj-Mészáros Dec 19 '10 at 18:53
This does not solve the answer for me. I need the xpath result to be in the form of a webelement, not a String, and so using /text() is not an option. – djangofan Jun 04 '15 at 22:44
@djangofan, text() selects all text-node children of the current node -- not strings as you believe. As for "webelements", no such thing exists in XPath. – Dimitre Novatchev Jun 04 '15 at 22:50
@SeanDuggan, Yes, XPath is a very elegant and powerful language. – Dimitre Novatchev Nov 04 '15 at 02:48

score 17 · Answer 2 · answered Dec 19 '10 at 16:56

17

Just select text() instead of .:

div/text()

On the given XML fragment, this returns:

Lorem ipsum dolor sit amet.

answered Dec 19 '10 at 16:56

Lucero

59,176
9
122
152

score 1 · Answer 3 · answered Apr 25 '17 at 15:03

1

How about this :
$doc/node()[3]/text()
Assuming $doc has the xml.

answered Apr 25 '17 at 15:03

bosari

1,922
1
19
38

Only select text directly in node, not in child nodes

3 Answers3

Linked