I'm new to Scrapy and Python. I have been working to extract data from 2 websites and they work really well if I do it directly with python. I have investigated and I want to crawl these websites:
- homedepot.com.mx/comprar/es/miguel-aleman/home (works perfectly)
- vallenproveedora.com.mx/ (doesn't work)
Can someone tell me how can I make the the second link work?
I see this message:
DEBUG: Crawled (200) allenproveedora.com.mx/> (referer: None) ['partial']
but I can't find out how to solve it.
I would appreciate any help and support. Here is the code and the log:
items.py
from scrapy.item import Item, Field
class CraigslistSampleItem(Item):
title = Field()
link = Field()
Test.py (spider folder)
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craigslist_sample.items import CraigslistSampleItem
class MySpider(BaseSpider):
name = "craig"
allowed_domains = ["vallenproveedora.com.mx"]
#start_urls = ["http://www.homedepot.com.mx/webapp/wcs/stores/servlet/SearchDisplay?searchTermScope=&filterTerm=&orderBy=&maxPrice=&showResultsPage=true&langId=-5&beginIndex=0&sType=SimpleSearch&pageSize=&manufacturer=&resultCatEntryType=2&catalogId=10052&pageView=table&minPrice=&urlLangId=-5&storeId=13344&searchTerm=guante"]
start_urls = ["http://www.vallenproveedora.com.mx/"]
def parse(self, response):
titles = response.xpath('//ul/li')
for titles in titles:
title = titles.select("a/text()").extract()
link = titles.select("a/@href").extract()
print (title, link)