1

I have read Scrapy: Follow link to get additional Item data? and followed it, but it is not working, probably it is sone simple mistake, so I am putting source code of my Spider.

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector

class MySpider1(Spider):
    name = "timeanddate"
    allowed_domains = ["http://www.timeanddate.com"]
    start_urls = (
        'http://www.timeanddate.com/holidays/',
    )

    def parse(self, response):
        countries = Selector(response).xpath('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]')

        for item in countries:

            link = item.xpath('@href').extract()[0]
            country = item.xpath('text()').extract()[0]

            linkToFollow = self.allowed_domains[0] + link + "/#!hol=1"

            print link  # link
            print country  # text in a HTML tag
            print linkToFollow

            request = scrapy.Request(linkToFollow, callback=self.parse_page2)


    def parse_page2(self, response):
        print "XXXXXX"
        hxs = HtmlXPathSelector(response)

        print hxs

I am trying too get list of all holidays per for each country, that is what I need to get to another page.

I can not understand why parse_page2 is not called.

Community
  • 1
  • 1
WebOrCode
  • 6,852
  • 9
  • 43
  • 70

1 Answers1

1

I could make your example work using Link Extractors

Here is an example:

#-*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.lxmlhtml import LxmlLinkExtractor

class TimeAndDateSpider(CrawlSpider):
    name = "timeanddate"
    allowed_domains = ["timeanddate.com"]
    start_urls = [
        "http://www.timeanddate.com/holidays/",
    ]


    rules = (
            Rule (LxmlLinkExtractor(restrict_xpaths=('//div[@class="fixed"]//li/a[contains(@href, "/holidays/")]',))
                , callback='second_page'),
            ) 

    #2nd page
    def second_page(self,response):
        print "second page - %s" % response.url

Will keep trying to make the Request callback example to work

André Teixeira
  • 2,392
  • 4
  • 28
  • 41
  • I have try it and it is not working, not even with yield – WebOrCode Feb 19 '15 at 18:56
  • Yes I did read it, now I am trying just to go from one page to another, after that I will add data to Items. – WebOrCode Feb 19 '15 at 18:59
  • This code is working. I have one question. In http://doc.scrapy.org/en/latest/topics/link-extractors.html#module-scrapy.contrib.linkextractors.sgml it is said that SgmlLinkExtractor is depreciated. Do you have special reason why you picked it ? – WebOrCode Feb 19 '15 at 19:37
  • your code will go to link eg. http://www.timeanddate.com/holidays/new-zealand but I want to go to http://www.timeanddate.com/holidays/new-zealand/#!hol=1 is this possible to do ? Also after visiting http://www.timeanddate.com/holidays/new-zealand/#!hol=1 I would like to go to http://www.timeanddate.com/holidays/new-zealand/2016#!hol=1 is this also possible ? – WebOrCode Feb 19 '15 at 21:57