-4

Is there any python crawler which pulls out all the data from a webpage for ex: http://www.bestbuy.com/site/HTC+-+One+S+4G+Mobile+Phone+-+Gradient+Blue+%28T-Mobile%29/4980512.p?id=1218587135819&skuId=4980512&contract_desc= In this page the customer review has two pages 1 and 2.I want to crawl t his url and get the content of both the pages. Is this possible with a python crawler.

Also does python crawler supports all the modern GET/POST technologies

Rajeev
  • 44,985
  • 76
  • 186
  • 285

2 Answers2

12

You could use Scrapy:

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

3

If you want to crawl a site, see this post. If you only want to process some pages and analyze their content (meaning you know the URLs you want to process), try BeautifulSoup, it allows you to do things like:

page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
for f in soup.findAll('form'):
    target_url = f['action']
    #do something with each one of the forms
Community
  • 1
  • 1
gutes
  • 152
  • 5