1

So i wrote a scraper and i am trying to only get the text of the paragraph that includes On Snow Feel

I am trying to pull that out, but im not sure how to have nokogiri pull out the paragraph that has something match text.

At the moment i have boards[:onthesnowfeel] = html.css(".reviewfold p").text but this captures all the paragraphs. And dont assume the paragraphs will be in order all the time. So cant just do [2] or something.

But what method would you use to scrape the paragraph that matches the text "On Snow Feel"

<div id="review" class="reviewfold">
<p>The <strong>Salomon A</strong><b>assassin</b>&nbsp;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. </p>
<p><b>Approximate Weight</b>: Moew mix is pretty normal</p>
<p><strong>On Snow Feel:&nbsp;</strong>At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum.</p>
<p><strong>Powder:&nbsp;</strong>It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. </p>
</div>
soniccool
  • 5,790
  • 22
  • 60
  • 98
  • Try `html.css(".reviewfold p").find { |e| e.text =~ /On Snow Feel/ }.text`. – sschmeck Feb 06 '16 at 23:19
  • That worked! @sschmeck – soniccool Feb 06 '16 at 23:31
  • 1
    See http://stackoverflow.com/questions/1474688/nokogiri-how-to-select-nodes-by-matching-text. Note that if you want to match text *at the start* of a paragraph, you'll have to use XPath: `doc.xpath("//*[@class='reviewfold']//p[starts-with(.,'On Snow Feel')]")` – sshaw Feb 07 '16 at 02:11

1 Answers1

1

You could use Enumerable#find in combination with a regexp match =~ to get the desired element content.

html.css(".reviewfold p").find { |e| e.text =~ /On Snow Feel/ }.text
sschmeck
  • 7,233
  • 4
  • 40
  • 67