Scraping data from multiple links within a site

Question

I would like to use scraperwiki and python to build a scraper that will scrape large amounts of information off of different sites. I am wondering if it is possible to point to a single URL and then scrape the data off of each of the links within that site.

For example: A site would contain information about different projects, each within its own individual link. I don't need a list of those links but the actual data contained within them.

The scraper would be looking for the same attributes on each of the links.

Does anyone know how or if I could go about doing this?

Thanks!

See http://stackoverflow.com/questions/2081586/web-scraping-with-python — Mihai8, Mar 08 '13 at 00:35

score 1 · Answer 1 · answered Mar 08 '13 at 01:14

Check out BeautifulSoup with urllib2.

http://www.crummy.com/software/BeautifulSoup/

An (very) rough example link scraper would look like this:

from bs4 import BeautifulSoup
import urllib2

c = urllib2.urlopen(url)
contents = c.read()
soup = BeautifulSoup(contents)
links = soup.find_all(a):

Then just write a for loop to do that many times over and you're set!

Scraping data from multiple links within a site

1 Answers1