I tried using igaggini's example on this page but can;t seem to get it to work with my code. Scrapy: Follow link to get additional Item data?
I'm pretty sure I have the right xpaths, the output should be the second paragraph in the first div of the scraped links from the countries page.
Here is my main file, recursive.py.
from scrapy.spider import BaseSpider
from bathUni.items import BathuniItem
from scrapy.selector import HtmlXPathSelector
from scrapy.http.request import Request
from urlparse import urljoin
class recursiveSpider(BaseSpider):
name = 'recursive'
allowed_domains = ['http://www.bristol.ac.uk/']
start_urls = ['http://www.bristol.ac.uk/international/countries/']
def parse(self, response):
hxs = HtmlXPathSelector(response)
links = []
#scrap main page to get row links
for i in range(1, 154):
xpath = ('//*[@id="all-countries"]/li[*]/ul/li[*]/a' .format (i+1))
link = hxs.select(xpath).extract()
links.append(link)
#parse links to get content of the linked pages
for link in links:
item = BathuniItem()
item ['Qualification'] = hxs.select('//*[@id="uobcms-content"]/div/div/div[1]/p[2]')
yield item
Here is my items file
from scrapy.item import Item, Field
class BathuniItem(Item):
Country = Field()
Qualification = Field()
And the output I receive is not what I want it to do, my csv file is full of these -
<HtmlXPathSelector xpath='//*[@id="all-countries"]/li[*]/ul/li[*]/a' data=u'<a href="/international/countries/albani'>