using XPath: how to exclude text in nested elements

Question

if I have some html like the following

<div class=unique_id>    
  <h1 class="parseasinTitle">
    <span> Game Title </span>
 </h1>
 Game Developer
</div>

Is there a way I can use xpath to get JUST the "Game Developer" part of the text? From searching around I tried:

//div[@class='unique_id' and not(self::h1/span)]

But that still gives me the entire text "Game Title Game Developer".

http://stackoverflow.com/questions/14620745/xpath-for-all-nested-text-except-n-nested-tags-text Here is the example I was trying to follow. This person is trying to get all the text except that in the last two
elements. I thought that was similar to my use case since I also want all text except that which appears in a specfic tag. I see where the mistake on using 'self' is though. If I modify my xpath to be //div[@class='unique_id']/*[not(self::h1/span)] I get nothing back using /text() or not. — theFakeGramita, Aug 13 '13 at 21:48

score 7 · Accepted Answer · answered Aug 13 '13 at 20:31

7

div[@class = 'unique_id']/text()[not(normalize-space() = '')]

or

div[@class = 'unique_id']/text()[last()]

depending on context.

Note that you still have to trim the resulting text node.

answered Aug 13 '13 at 20:31

Tomalak

332,285
67
532
628

Thanks. Using test() is not working for me- I just get nothing back (see response to choroba) – theFakeGramita Aug 13 '13 at 21:00
sorry, meant to say 'text()' – theFakeGramita Aug 13 '13 at 21:19
You do use `//` at the start of the XPath expression? – Tomalak Aug 13 '13 at 21:30
yes, I am using that. I'm not sure why, maybe just some other issue I had, but now your first solution is working (to use /text()[not(normalize-space() = '')] after the tag selecting my div. I also notice that using //text() selects the text in the h1 and NOT the text outside...interesting. Can you explain to me how your solution works? – theFakeGramita Aug 13 '13 at 22:07
Of course using `//text()` selects all descendant text nodes, not just immediate children. My solution is quite straight-forward, it reads "all child text nodes whose normalized contents is not empty". You could write it as `text()[normalize-space(.) != '']` if that's more obvious to you. – Tomalak Aug 14 '13 at 07:41

score 0 · Answer 2 · answered Aug 13 '13 at 20:31

0

The conditions in square brackets ("predicate") specify conditions for the node. The div node is not h1 at the same time, so the negation is satisfied. But if you used child instead of self, which was probably your original intent, you would not get the expected text - you would get nothing, because it means "Search for a div with unique_id tah does not have a h1/span child".

If you want text, specify text():

//div/text()[last()]

answered Aug 13 '13 at 20:31

choroba

231,213
25
204
289

Thanks, I'm trying to find the example where I got that from now but I can't. I thought child made more sense too, but the example was using self... I tried this and when I use text() I get nothing back. This may be because the command I'm using on this xpath target is already supposed to do the job of getting the text from the element (a command like storeText). if I'm using a command the is meant to get the text from a target element, is there a way I can specify that it not get anything in those nested tags? – theFakeGramita Aug 13 '13 at 20:59

using XPath: how to exclude text in nested elements

2 Answers2

Linked