Python Scrapy Function Call

Question

I try to call the getNext() function from the main parse function that scrappy calls but it never gets called.

class BlogSpider(scrapy.Spider):
      # User agent.
      name = 'Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19'
      start_urls = ['http://www.tricksforums.org/best-free-movie-streaming-sites-to/']

      def getNext(self):
        print("Getting next ... ")
        # Check if next link in DB is valid and crawl.
        try:
          nextUrl = myDb.getNextUrl()
          urllib.urlopen(nextUrl).getcode()
          yield scrapy.Request(nextUrl['link'])
        except IOError as e:
          print("Server can't be reached", e.code)
          yield self.getNext()

      def parse(self, response):
        print("Parsing link: ", response.url)
        # Get all urls for futher crawling.
        all_links = hxs.xpath('*//a/@href').extract()
        for link in all_links:
          if validators.url(link) and not myDb.existUrl(link) and not myDb.visited(link):
            myDb.addUrl(link)
        print("Getting next?")
        yield self.getNext()

I tried with and without yield before it .. what's the issue ? And what's this yield supposed to be ? :)

`('Parsing link: ', 'http://www.tricksforums.org/best-free-movie-streaming-sites-to/') Getting next?` That's all I get :) — Alessandro, Jun 19 '17 at 19:30
So, you do see `Getting next` printed..that means that `getNext()` is executed, right? Thanks. — alecxe, Jun 19 '17 at 19:31
Nope .. I print "Getting next?" before calling the function. The first print within getNext is not printed — Alessandro, Jun 19 '17 at 19:35

score 3 · Accepted Answer · answered Jun 19 '17 at 19:38

3

You are trying to yield a generator, but meant to yield from a generator.

If you are on Python 3.3+, you can use yield from:

yield from self.getNext()

Or, simply do the return self.getNext().

answered Jun 19 '17 at 19:38

alecxe

462,703
120
1,088
1,195

1

@Alessandro you should also have noticed the message on the console: `2017-06-19 15:42:49 [scrapy.core.scraper] ERROR: Spider must return Request, BaseItem, dict or None, got 'generator' in ` - please check out [this SO topic](https://stackoverflow.com/q/1756096/771848) about understanding generators. Thanks! – alecxe Jun 19 '17 at 19:44
1

I had the "--nolog" flag .. yeah – Alessandro Jun 19 '17 at 19:46

Python Scrapy Function Call

1 Answers1