Python web crawler

Question

Is there any python crawler which pulls out all the data from a webpage for ex: http://www.bestbuy.com/site/HTC+-+One+S+4G+Mobile+Phone+-+Gradient+Blue+%28T-Mobile%29/4980512.p?id=1218587135819&skuId=4980512&contract_desc= In this page the customer review has two pages 1 and 2.I want to crawl t his url and get the content of both the pages. Is this possible with a python crawler.

Also does python crawler supports all the modern GET/POST technologies

Instead you could see if Best Buy has a API that would work for you. — kyle k, May 06 '14 at 21:40

score 12 · Answer 1 · answered Jul 26 '12 at 13:32

You could use Scrapy:

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

score 3 · Answer 2 · edited May 23 '17 at 12:15

3

If you want to crawl a site, see this post. If you only want to process some pages and analyze their content (meaning you know the URLs you want to process), try BeautifulSoup, it allows you to do things like:

page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
for f in soup.findAll('form'):
    target_url = f['action']
    #do something with each one of the forms

edited May 23 '17 at 12:15

Community

1
1

answered Jul 26 '12 at 14:47

gutes

152
5

Python web crawler

2 Answers2