52

If I have a bunch of elements like:

<p>A paragraph <ul><li>Item 1</li><li>Apple</li><li>Orange</li></ul></p>

Is there a built-in method in Nokogiri that would get me all p elements that contain the text "Apple"? (The example element above would match, for instance).

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Zando
  • 5,473
  • 8
  • 30
  • 37

4 Answers4

64

Nokogiri can do this (now) using jQuery extensions to CSS:

require 'nokogiri'

html = '
<html>
  <body>
    <p>foo</p>
    <p>bar</p>
  </body>
</html>
'

doc = Nokogiri::HTML(html)
doc.at('p:contains("bar")').text.strip
=> "bar"
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • 1
    if you replaced bar with a another text like : "google encrypted \"google drive\" this year" it produces an error, any ideas how to properly escape the " character ? – Emad Elsaid Jul 29 '13 at 14:31
  • Try using `"` instead of the embedded quotes? – the Tin Man Jul 30 '13 at 01:59
  • Is there any way to do this without knowing what type of element contains the text? If I do `doc.at(':contains("bar")')` (i.e. without specifying a `p` element) then I get the whole document. – crantok Oct 04 '18 at 18:16
  • 1
    It's okay, I found the answer `doc.at(':contains("foo"):not(:has(:contains("foo")))')` here https://makandracards.com/makandra/38803-find-the-innermost-dom-element-that-contains-a-given-string – crantok Oct 04 '18 at 18:56
55

Here is an XPath that works:

require 'nokogiri'

doc = Nokogiri::HTML(DATA)
p doc.xpath('//li[contains(text(), "Apple")]')

__END__
<p>A paragraph <ul><li>Item 1</li><li>Apple</li><li>Orange</li></ul></p>
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Aaron Patterson
  • 2,275
  • 1
  • 20
  • 11
  • 1
    I prefer this one since it returns all instances of matching nodes compare to the accepted answer which only returns 1 node. – Kok A. Aug 06 '21 at 02:55
7

You can also do this very easily with Nikkou:

doc.search('p').text_includes('bar')
Tom
  • 1,007
  • 12
  • 13
6

Try using this XPath:

p = doc.xpath('//p[//*[contains(text(), "Apple")]]')
andre-r
  • 2,685
  • 19
  • 23