nokogiri select paragraph with text match

Question

So i wrote a scraper and i am trying to only get the text of the paragraph that includes On Snow Feel

I am trying to pull that out, but im not sure how to have nokogiri pull out the paragraph that has something match text.

At the moment i have boards[:onthesnowfeel] = html.css(".reviewfold p").text but this captures all the paragraphs. And dont assume the paragraphs will be in order all the time. So cant just do [2] or something.

But what method would you use to scrape the paragraph that matches the text "On Snow Feel"

<div id="review" class="reviewfold">
<p>The <strong>Salomon A</strong><b>assassin</b>&nbsp;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. </p>
<p><b>Approximate Weight</b>: Moew mix is pretty normal</p>
<p><strong>On Snow Feel:&nbsp;</strong>At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum.</p>
<p><strong>Powder:&nbsp;</strong>It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. </p>
</div>

Try `html.css(".reviewfold p").find { |e| e.text =~ /On Snow Feel/ }.text`. — sschmeck, Feb 06 '16 at 23:19
See http://stackoverflow.com/questions/1474688/nokogiri-how-to-select-nodes-by-matching-text. Note that if you want to match text *at the start* of a paragraph, you'll have to use XPath: `doc.xpath("//*[@class='reviewfold']//p[starts-with(.,'On Snow Feel')]")` — sshaw, Feb 07 '16 at 02:11

score 1 · Accepted Answer · answered Feb 08 '16 at 07:25

1

You could use Enumerable#find in combination with a regexp match =~ to get the desired element content.

html.css(".reviewfold p").find { |e| e.text =~ /On Snow Feel/ }.text

answered Feb 08 '16 at 07:25

sschmeck

7,233
4
40
67

nokogiri select paragraph with text match

1 Answers1