0

I have this code:

 def somefunc(self):
    ...
    if self.mynums>= len(self.totalnums):
        if 1 == 1: return self.crawlSubLinks()
        for num in self.nums:
            if not 'hello' in num: continue
            if 0 == 1:
               #if though this is never reached, when using yield, the crawler stops execution after the return statement at the end.
               #When using return instead of yield, the execution continues as expected - why?
               print("in it!"); 
               yield SplashRequest(numfunc['asx'], self.xo, endpoint ='execute', args={'lua_source': self.scripts['xoscript']})

    def crawlSubLinks(self):
        self.start_time = timer()
        print("IN CRAWL SUB LINKS")
        for link in self.numLinks:
            yield scrapy.Request(link callback=self.examinenum, dont_filter=True)

As you can see, the SplashRequest is never reached, so its implementation is not important in this case. So the goal is to keep sending requests by returning self.crawlSubLinks. Now here is the problem:

When I use return before the SplashRequest that is never reached, the crawler continues its execution as expected by processing the new requests from crawlSubLinks. However, for some reason, when I use yield before the SplashRequest that is never reached, the crawler stop after the return statement! Whether I use yield or return in a line that is never executed should not matter at all, right?

Why is this? I have been told that this has something to do with the behavior of Python, only. But how can I then have a yield statement within the for loop while still returning in the if statement above the for loop and not return a generator?

  • 2
    Can you extend your code to an [MVCE](https://stackoverflow.com/help/mcve)? – Alex Yu Mar 02 '19 at 13:44
  • I don't think that is necessary. The question is more on the theoretical part, and based on the two answers, the question is quite clear :) –  Mar 02 '19 at 13:46
  • 2
    Well, I'm glad if you found the solution. Although having MVCE in question is always fine – Alex Yu Mar 02 '19 at 13:48
  • Of course, i'll keep that in mind –  Mar 02 '19 at 13:49
  • Your probliem is nothing to do with a problem in scrapy so you should remove those tags and you should consider removing scrapy from the title of your question. – DisappointedByUnaccountableMod Mar 02 '19 at 13:55
  • Possible duplicate of [What does the "yield" keyword do?](https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do) – Daniel Pryden Mar 02 '19 at 14:02
  • Possible duplicate of [Generator with return statement](https://stackoverflow.com/questions/37661068/generator-with-return-statement) – Paritosh Singh Mar 02 '19 at 14:08
  • The decision of turning something has to be made not when you "run" and encounter a yield, but much before. A conditional would have to be executed before the function would know that you can never reach the yield, your code cannot infer that beforehand. If there's a yield anywhere, its a generator. @asd – Paritosh Singh Mar 02 '19 at 14:11
  • @asd: The premise of your question is flawed: the `yield` statement *is* "reached" when the `def` statement is executed and the body of the function is turned into a function object, and that's when it has an effect on the function object that's produced. – Daniel Pryden Mar 02 '19 at 14:13

2 Answers2

0

Is the yield in a function? Outside of functions it gets run at the start of the script and halts because it's not "yielded" by a generator.

Put the code inside of a function and it won't happen. Same goes for async stuff outside of functions. Broke my head more than once.

NoSplitSherlock
  • 605
  • 4
  • 19
0

enter image description hereThat happen because if you have a yield statement in your function that function will return a generator

Fanto
  • 128
  • 1
  • 9
  • Even though the yield statement is never reached? :/ Code that is never reached cannot be expected to have any effect? –  Mar 02 '19 at 13:42
  • yes even if it will never happen, because the compiler can not know if the statement will be reached – Fanto Mar 02 '19 at 13:44
  • But Python is not compiled? –  Mar 02 '19 at 13:46
  • @asd It *is* read at startup. Every python file/script itself gets run when you start or import it. That's why people use if __name__ == "__main__". Everything after that is protected from being run straight off the script. – NoSplitSherlock Mar 02 '19 at 13:47
  • Ahh thanks you two! I am glad that I tagged this python too. That was obviously the right choice then. Does this behavior have any name - in case I would like to read more about it? –  Mar 02 '19 at 13:49
  • No, the CPython interpreter compile the source code to bytecode before and then will execute that bytecode, interpreted language doesn't mean not compiled. If you are interested on the topic there are a lot of articles that explain this topic, just write how python compiler works on google. – Fanto Mar 02 '19 at 13:51
  • But I am not using CPython? Just Python –  Mar 02 '19 at 13:58
  • @asd Python is a language specification, not an implementation. You almost certainly _are_ using CPython, which is the most common implementation of that specification – roganjosh Mar 02 '19 at 14:04
  • python is a language. CPython is the most common implementation for python. if you are using python and don't know what you're using, youre mostly likely using CPython. @asd – Paritosh Singh Mar 02 '19 at 14:04
  • Thanks! But how can I then have a yield statement within the for loop while still returning in the if statement above the for loop and not return a generator? –  Mar 02 '19 at 14:06