Scraping/extracting data using mechanize

Question

Using Mechanize, I would like to scrape information on this website => http://www.africanbookscollective.com

This is the information I would like to gather:

All Books listed under the category Fiction

Under this category, I want:

Author name
Book Title
isbn number
Publisher
Country

I have figured out that this url => http://www.africanbookscollective.com/browse/african-literature/fiction gives me the information I want.

The is my current code:

require 'awesome_print'
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
page = agent.get('http://www.africanbookscollective.com/browse/african-literature/fiction')
a = page.links.each do |link|
  puts link.text
end

ap a

This is my first time using mechanize and as such I am not exactly sure how it differs from Nokogiri. The main reason I am using it in this particular case is because I need to extract information across 38 pages (the complete list of Books tagged Fiction).

ISSUES:

I am getting a really really long output from mechanize that includes links I don't need.
The information I need is not in a div class - it is in a a dl class and I have tried googling for how to select that a dl class but have not had any luck so far.
Each time I have performed a regex operation to remove the links I do not war, i get an empty array back

Can someone, anyone, please help me think of a new way to approach this problem? I really would appreciate feedback.

PS: Here is an image that might shed some more light

enter image description here

score 0 · Answer 1 · answered Jan 01 '14 at 21:58

0

You can use scrape4me.com to get the raw output for further process in your project(mechanize) Don't know mechanize but maybe this can help, good luck

answered Jan 01 '14 at 21:58

Youss

4,196
12
55
109

Scraping/extracting data using mechanize

1 Answers1