I'm trying to create a function that takes care of a recurring task in multiple spiders. It involves yielding a request that seems to break it. This question is a follow-up from this question.
import scrapy
import json
import re
class BaseSpider(scrapy.Spider):
start_urls = {}
def test(self, response, cb, xpath):
self.logger.info('Success')
for url in response.xpath(xpath).extract():
req = scrapy.Request(response.urljoin(url), callback=cb)
req.meta['category'] = response.meta.get('category')
yield req
When the yield req
is in the code, the "Success" logger suddenly does not work anymore and the callback function does not seem to be called. When yield req
is commented, the logger does show the "Success" logger. Although I don't think the issue is in the spider, below the code of the spider:
# -*- coding: utf-8 -*-
import scrapy
from crawling.spiders import BaseSpider
class testContactsSpider(BaseSpider):
""" Test spider """
name = "test"
start_urls = {}
start_urls['test'] = 'http://www.thewatchobserver.fr/petites-annonces-montres#.WfMaIxO0Pm3'
def parse(self,response):
self.logger.info('Base page: %s', response.url)
self.test(response, self.parse_page, '//h3/a/@href')
def parse_page(self, response):
self.logger.info('Page: %s', response.url)