Scrapy Spider not following Request callback

Question

I have read Scrapy: Follow link to get additional Item data? and followed it, but it is not working, probably it is sone simple mistake, so I am putting source code of my Spider.

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector

class MySpider1(Spider):
    name = "timeanddate"
    allowed_domains = ["http://www.timeanddate.com"]
    start_urls = (
        'http://www.timeanddate.com/holidays/',
    )

    def parse(self, response):
        countries = Selector(response).xpath('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]')

        for item in countries:

            link = item.xpath('@href').extract()[0]
            country = item.xpath('text()').extract()[0]

            linkToFollow = self.allowed_domains[0] + link + "/#!hol=1"

            print link  # link
            print country  # text in a HTML tag
            print linkToFollow

            request = scrapy.Request(linkToFollow, callback=self.parse_page2)


    def parse_page2(self, response):
        print "XXXXXX"
        hxs = HtmlXPathSelector(response)

        print hxs

I am trying too get list of all holidays per for each country, that is what I need to get to another page.

I can not understand why parse_page2 is not called.

I tired return it, but also it is not working. If you have time please show me with code your idea. — WebOrCode, Feb 19 '15 at 19:34
`yield request` after that `request = scrapy.Request(...)` line. — Ngenator, Feb 19 '15 at 19:38

André Teixeira · Answer 1 · 2015-02-19T19:41:03.200

1

I could make your example work using Link Extractors

Here is an example:

#-*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.lxmlhtml import LxmlLinkExtractor

class TimeAndDateSpider(CrawlSpider):
    name = "timeanddate"
    allowed_domains = ["timeanddate.com"]
    start_urls = [
        "http://www.timeanddate.com/holidays/",
    ]


    rules = (
            Rule (LxmlLinkExtractor(restrict_xpaths=('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]',))
                , callback='second_page'),
            ) 

    #2nd page
    def second_page(self,response):
        print "second page - %s" % response.url

Will keep trying to make the Request callback example to work

edited Feb 19 '15 at 19:41

answered Feb 19 '15 at 18:53

André Teixeira

2,392
4
28
41

I have try it and it is not working, not even with yield – WebOrCode Feb 19 '15 at 18:56
Yes I did read it, now I am trying just to go from one page to another, after that I will add data to Items. – WebOrCode Feb 19 '15 at 18:59
This code is working. I have one question. In http://doc.scrapy.org/en/latest/topics/link-extractors.html#module-scrapy.contrib.linkextractors.sgml it is said that SgmlLinkExtractor is depreciated. Do you have special reason why you picked it ? – WebOrCode Feb 19 '15 at 19:37
your code will go to link eg. http://www.timeanddate.com/holidays/new-zealand but I want to go to http://www.timeanddate.com/holidays/new-zealand/#!hol=1 is this possible to do ? Also after visiting http://www.timeanddate.com/holidays/new-zealand/#!hol=1 I would like to go to http://www.timeanddate.com/holidays/new-zealand/2016#!hol=1 is this also possible ? – WebOrCode Feb 19 '15 at 21:57

Scrapy Spider not following Request callback

1 Answers1