I'm looking for assistance on the best way to loop through successive pages on a website while scraping relevant data off of each page.
For example, I want to go to a specific site (craigslist in below example), scrape the data from the first page, go to the next page, scrape all relevant data, etc, until the very last page.
In my script I'm using a while
loop since it seemed to make the most sense to me. However, it doesn't appear to be working properly and is only scraping data from the first page.
Can someone familiar with Ruby/Mechanize point me in the right direction on what the best way to accomplish this task is. I've spent countless hours trying to figure this out and feel like I'm missing something very basic.
Thanks in advance for your help.
require 'mechanize'
require 'pry'
# initialze
agent = Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari'}
url = "http://charlotte.craigslist.org/search/rea"
page = agent.get(url)
# Create an empty array to dump contents into
property_results = []
# Scrape all successive pages from craigslist
while page.link_with(:dom_class => "button next") != nil
next_link = page.link_with(:dom_class => "button next")
page.css('ul.rows').map do |d|
property_hash = { title: d.at_css('a.result-title.hdrlnk').text }
property_results.push(property_hash)
end
page = next_link.click
end
UPDATE: I found this, but still no dice:
Ruby Mechanize: Follow a Link
@pguardiario
require 'mechanize'
require 'httparty'
require 'pry'
# initialze
agent = Mechanize.new
url = "http://charlotte.craigslist.org/search/rea"
page = agent.get(url)
#create Empty Array
property_results = []
# Scrape all successive pages from craigslist
while link = page.at('[rel=next]')
page.css('ul.rows').map do |d|
property_hash = { title: d.at_css('a.result-title.hdrlnk').text }
property_results.push(property_hash)
end
link = page.at('[rel=next]')
page = agent.get link[:href]
end
pry(binding)