47

I tried to search for nodes containing text 'Yahoo' under '/doc/story/content', it returns 'content' node, but I need exact text node that contains 'Yahoo' or it's parent

<doc>
    <story>
        <content id="201009281450332423">
            <ul>MSW NYNES NYPG1 DILMA</ul>
            <p> <k> Yahoo, made </k> it nice </p>
            <p>
               <author>-v-</author>
            </p>
        </content>
    </story>
</doc>

Xpath: "/doc/story/content[contains(., 'Yahoo')]"

Vjy
  • 2,106
  • 3
  • 22
  • 34

2 Answers2

58

Since you need all textNodes only which contain the text Yahoo, use the following XPath.

//text()[contains(., 'Yahoo')]

This should return you all the textNodes only which contains Yahoo (case-sensitive) in it.

Nakilon
  • 34,866
  • 14
  • 107
  • 142
Ravish
  • 2,428
  • 2
  • 18
  • 24
  • 1
    What it the difference between this answer and @Jon's? – Nakilon Oct 09 '15 at 08:56
  • Case insensitive: //text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜÉÈÊÀÁÂÒÓÔÙÚÛÇÅÏÕÑŒ', 'abcdefghijklmnopqrstuvwxyzäöüéèêàáâòóôùúûçåïõñœ'),'yahoo')] – Stefan Steiger Jun 02 '17 at 13:46
42

Your XML is malformed. </content></doc></story> should be </content></story></doc>.

Apart from that, the XPath you would want is

/doc/story/content//*[contains(., 'Yahoo')]

(select any descendant of <content> which contains the text "Yahoo" -- this will select the <p>)

Jon
  • 428,835
  • 81
  • 738
  • 806
  • This works great if it's one level down, How to make it work for multi-nested tags? – Vjy Jun 22 '11 at 16:01
  • @Vjy: I 'm not sure what you mean. Can you give an example? – Jon Jun 22 '11 at 16:03
  • Updated the above xml with additional tag , it should select K instead of P tag. this is just example, the text node can be n level deep. – Vjy Jun 22 '11 at 17:48
  • @Vjy: this does exactly what you asked for. – Emiliano Poggi Jun 22 '11 at 21:00
  • 1
    text() is a node test not a string. contains() expects strings. See http://stackoverflow.com/a/9493870/695671 Your solution may appear to work, but I have a case with text nodes within text nodes in which case it fails. – Jason S Jan 21 '14 at 06:14
  • @JasonS: That situation did not cross my mind (how did you manage to do it? programmatically?). I have corrected the answer accordingly. Thank you for pointing that out, I feel I learned something new. – Jon Jan 21 '14 at 10:27
  • @Jon I did it as in your updated answer. I am getting content from text nodes in odt files using PHP SimpleXMLElement. The odt often has paragraphs with tabs and spaces represented like `Jon`, in which case searching using `contains(text(),"Jon")` will fail, but `contains(.,"Jon")` will work. – Jason S Jan 21 '14 at 22:54