0

I am trying to quit a python program by calling sys.exit() but it does not seem to be working.

The program structure is something like:

def func2():
    *does some scraping operations using scrapy*

def func1():
    Request(urls, callbakc=func2)

So, here, func1 is requesting a list of URLs and the callback method, func2 is being called. I want to quit the execution of the program if something goes wrong in func2

On checking the type of the object in func1 I found its and http.Request object.

Also, since I am using scrapy, whenever I call sys.exit() in func2, the next url in the list is called and the program execution continues.

I have also tried to use a global variable to stop the execution but to no avail.

Where am I going wrong?

praxmon
  • 5,009
  • 22
  • 74
  • 121

2 Answers2

1

According to the How can I instruct a spider to stop itself?, you need to raise CloseSpider exception:

raise CloseSpider('Done web-scraping for now')

Also see:

sys.exit() would not work here since Scrapy is based on twisted.

Community
  • 1
  • 1
byron he
  • 342
  • 2
  • 6
0

Even if we don't know how to completely stop, Python's mutable-object default binding "gotcha" can help us skip all callbacks from a certain point on.

Here is what you can do:

First, create a function generating wrapping other callback functions with condition. It's second argument cont is going to be bound to a mutable object (list) so we can affect all callbacks after creating them.

def callback_gen(f, cont=[True]):
    def c(response):
        if cont[0]:
            f(response, cont=cont)
        else:
            print "skipping" # possibly replace with pass
    return c

Now make some testing functions:

def func2(response, cont=None):
    print response
    print cont
    # this should prevent any following callback from running
    cont[0]=False

def func3(response, cont=None):
    print response
    print cont

And now create two callbacks the first one is func2 which prevents the following ones from running.

f2 = callback_gen(func2)
f3 = callback_gen(func3)
f2("func2")
f3("func3")

I like it :)

Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88