0

I am currently working on my scrapy project, where I scrap content from internet forums.

I want my application to offer different scenarios of scraping, so I wrote a method that should handle these scenarios based on some input flags

def work_executor(self, response):
    yield from self.parse_categories(response)
    if not parse_only_categories:
          yield from self.parse_topics()

Parse topics method should read categories from mongodb and send request to each category and parse posts. However the parse_topics method is never executed although flag is set.

Work executor is called from parse method of my spider

def parse(self, response):
        yield from self.work_executor(response)

Earlier I did it this way, which worked

def parse_categories(self, response):
    #some working code
    yield scrapy.Request(url=self.base_domain +link,callback=self.parse_topics)

This solution is hovewer not generic enough for my project so that I am looking for alternatives of how to achieve this efect in the way I tried (and described in the first part of my post)

Any help would be greatly appreciated.

chodi
  • 41
  • 5
  • are you familiar with the differences between yield and yield from? check here: https://stackoverflow.com/questions/9708902/in-practice-what-are-the-main-uses-for-the-new-yield-from-syntax-in-python-3 – Rend Apr 09 '18 at 12:01
  • Yes I am, does it anything to do with this one here? – chodi Apr 09 '18 at 12:18
  • This looks like it should work. It's hard to tell why it's not without seeing a [mcve] – stranac Apr 09 '18 at 13:07

0 Answers0