0

I have the following line in a long loop

page = Nokogiri::HTML(open(topic[:url].first)).xpath('//ul[@class = "pages"]//li').first

Sometimes my Ruby application crashes raising the "End of file reached " exception in this line.

How can I resolve this problem? Just a begin;raise;end block?

Is a script that performs a forum backup, so is important that doesn't skip any thread.

Thanks in advance.

Roxas Shadow
  • 380
  • 4
  • 8

2 Answers2

1

In addition to @Phrogz's excellent advice (in particular about at_css with the simpler expression), I would pull the raw xml [content] separately:

page = if (content = open(topic[:url].first)).strip.length > 0
  Nokogiri::HTML(content).xpath('//ul[@class = "pages"]//li').first
end
Seamus Abshere
  • 8,326
  • 4
  • 44
  • 61
  • Thanks for the replies but I do `next unless topic[:url].first.page_exists? '//ul[@class = "pages"]//li'` first to do `page = Nokogiri::HTML(open(topic[:url].first)).xpath('//ul[@class = "pages"]//li').first`. #page_exists is a method of String: `begin !Nokogiri::HTML(open(self)).to_s.empty? rescue Exception => e false end` – Roxas Shadow Jul 25 '12 at 23:00
0

I would suggest that you should first to fix the underlying issue so that you do not get this error.

  • Does the same URL always cause the problem? (Output it in your log files.) If so, perhaps you need to URI encode the URL.
  • Is it random, and therefor likely related to a connection hiccup or server problem? If so, you should rescue the specific error and then retry one or more times to get the crucial data.

Secondarily, you should know that the CSS syntax for that query is far simpler:

page = Nokogiri.HTML(...).at_css('ul.pages li')
  • Not only is this less than half the bytes, it allows for cases like <ul class="foo pages"> that the XPath would miss.
  • Using at_css (or at_xpath) is the same as .css(...).first, but is faster and simpler.
Community
  • 1
  • 1
Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • Surely can be a restriction of the server (the application is crashed with the call #458), but I cannot be sure that it is the truth... – Roxas Shadow Jul 25 '12 at 23:08