1

I am trying to scrape a website which has multiple page results like "1, 2, 3, 4, 5...". Every pagination number is a link to another page and I need to scrape every page. So far I came up with this:

while lien = page.link_with(:text=> link_number.to_s)
            link_number = link_number + 1
            body = page.body
            html_body = Nokogiri::HTML(body)
            html_body.css('#personne tbody tr').each do |person|
              puts person.css('td').first.text.to_s
            end
            page = lien.click
          end

But this never scraps the last page.

Please help me write better code that scrapes the last page.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
David Geismar
  • 3,152
  • 6
  • 41
  • 80
  • Welcome to Stack Overflow. Please supply a _minimal_ example of the HTML that demonstrates what you're trying to handle. Also, when working with Mechanize it's *NEVER* necessary to use `Nokogiri::HTML(body)` to get a DOM of the HTML. Mechanize already uses Nokogiri and you can easily access its internal DOM. – the Tin Man Jun 10 '15 at 22:32

1 Answers1

1

The problem is that on the last page there will not be a link to a next page. Thus the condition on the while statement evaluates to nil and so the main body of the while is not executed.

As suggested here you'll need something like this:

loop do
  lien = page.link_with(:text=> link_number.to_s)
  link_number = link_number + 1
  page.parser.css('#personne tbody tr').each do |person|
    puts person.css('td').first.text.to_s
  end
  break unless lien
  page = lien.click
end
Community
  • 1
  • 1
egwspiti
  • 957
  • 5
  • 10