The problem is that products (mobiles) on this site are loaded dynamically via XHR request.
You have to simulate it in scrapy in order to get necessary data. For more info on the subject, see:
Here's the spider in your case. Note, that the url I've got from chrome developer tools, network tab:
from scrapy.item import Item, Field
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class BigCMobilesItem(Item):
title = Field()
price = Field()
class BigCMobilesSpider(BaseSpider):
name = "bigcmobile_spider"
allowed_domains = ["bigcmobiles.in"]
start_urls = [
"http://www.bigcmobiles.in/Handler/ProductShowcaseHandler.ashx?ProductShowcaseInput={%22PgControlId%22:1152173,%22IsConfigured%22:true,%22ConfigurationType%22:%22%22,%22CombiIds%22:%22%22,%22PageNo%22:1,%22DivClientId%22:%22ctl00_ContentPlaceHolder1_ctl00_ctl07_Showcase%22,%22SortingValues%22:%22%22,%22ShowViewType%22:%22%22,%22PropertyBag%22:null,%22IsRefineExsists%22:true,%22CID%22:%22CU00091056%22,%22CT%22:0,%22TabId%22:0}&_=1369724967084"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
mobiles = hxs.select("//div[@class='bucket']")
print mobiles
for mobile in mobiles:
item = BigCMobilesItem()
item['title'] = mobile.select('.//h4[@class="mtb-title"]/text()').extract()[0]
try:
item['price'] = mobile.select('.//span[@class="mtb-price"]/label[@class="mtb-ofr"]/text()').extract()[
1].strip()
except:
item['price'] = 'n/a'
yield item
Save it in spider.py
, and run via scrapy runspider spider.py -o output.json
. Then in output.json
you will see:
{"price": "13,999", "title": "Samsung Galaxy S Advance i9070"}
{"price": "9,999", "title": "Micromax A110 Canvas 2"}
{"price": "25,990", "title": "LG Nexus 4 E960"}
{"price": "39,500", "title": "Samsung Galaxy S4 I9500 - Black"}
...
These are products from the first page. In order to get mobiles from other pages, take a look at the XHR request the site is using - it has PageNo
parameter - looks like what you need.
Hope that helps.