44

I am trying to find a certain text in any text node in a document, so far my statement looks like this:

doc.xpath("//text() = 'Alliance Consulting'") do |node|
  ...
end

This obviously does not work, can anyone suggest a better alternative?

kkurian
  • 3,844
  • 3
  • 30
  • 49
dagda1
  • 26,856
  • 59
  • 237
  • 450
  • 2
    Are you sure you want to find the text node? I think it's more likely that you really want to find the element containing the text node. I would suggest `//*[. = 'Alliance Consulting']` – Michael Kay Feb 22 '11 at 08:38
  • @Michael Kay: I agree that it's better not to select text nodes (particulary in mixed content data model like XHTML). But I would use `//*[. = 'Alliance Consulting'][not(* = 'Alliance Consulting')]` to select the inner most elements with such string value. –  Feb 24 '11 at 23:04
  • Your question might be more valuable if you removed the Ruby code. Not everyone will recognize it, and it doesn't seem relevant to your question. – jpaugh Jan 25 '16 at 21:59

1 Answers1

85

This expression //text() = 'Alliance Consulting' evals to a boolean.

In case of this test sample:

<r>
    <t>Alliance Consulting</t>
    <s>
        <p>Test string
            <f>Alliance Consulting</f>
        </p>
    </s>
    <z>
        Alliance Consulting
        <y>
            Other string
        </y>
    </z>
</r>

It will return true of course.

Expression you need should evaluate to node-set, so use:

//text()[. = 'Alliance Consulting']

E.g. expression:

count(//text()[normalize-space() = 'Alliance Consulting'])

against the above document will return 3.

To select text nodes which contain 'Alliance Consulting' in the whole string value (e.g. 'Alliance Consulting provides great services') use:

//text()[contains(.,'Alliance Consulting')]

Do note that adjacent text nodes should become one after parser gets to the document.

Flack
  • 5,862
  • 2
  • 23
  • 27