How to get all elements via CSS class

Question

I am trying to scrape this page using Nokogiri to get all the elements with class name of "teaser".

If I check the page with jQuery, I can see there are 25 elements:

$(".teaser").length => 25

However, when using Nokogiri, I only get the first teaser:

teasers = doc.css('.teaser')
teasers.count => 1

Where am I going wrong? How do I get all the teasers?

If you see output of "doc.to_html" , You will get only one teaser element. — dnsh, Sep 12 '16 at 18:22
You should take a look at http://stackoverflow.com/questions/13789583/html-is-read-before-fully-loaded-using-open-uri-and-nokogiri — dnsh, Sep 12 '16 at 18:35

score 1 · Accepted Answer · edited Sep 13 '16 at 19:15

1

That document appears to have a load of null bytes in it for some reason, and this is causing Nokogiri/LibXML to assume the document has finished part way through.

You should be able to fix it by preprocessing the contents to remove the nulls. If page contains the text of the webpage:

page.gsub! /\x00/, ''

Then use Nokogiri on page as before.

edited Sep 13 '16 at 19:15

the Tin Man

158,662
42
215
303

answered Sep 12 '16 at 18:57

matt

78,533
8
163
197

How to get all elements via CSS class

1 Answers1