3

I want to know how can I stop my program in console with CTRL+C or smth similar. The problem is that there are two threads in my program. Thread one crawls the web and extracts some data and thread two displays this data in a readable format for the user. Both parts share same database. I run them like this :

from threading import Thread
import ResultsPresenter

def runSpider():
    Thread(target=initSpider).start()
    Thread(target=ResultsPresenter.runPresenter).start()


if __name__ == "__main__":
    runSpider()

how can I do that?

Ok so I created my own thread class :

import threading

class MyThread(threading.Thread):
"""Thread class with a stop() method. The thread itself has to check
regularly for the stopped() condition."""

def __init__(self):
    super(MyThread, self).__init__()
    self._stop = threading.Event()

def stop(self):
    self._stop.set()

def stopped(self):
    return self._stop.isSet()

OK so I will post here snippets of resultPresenter and crawler. Here is the code of resultPresenter :

# configuration
DEBUG = False
DATABASE = database.__path__[0] + '/database.db'

app = Flask(__name__)
app.config.from_object(__name__)
app.config.from_envvar('CRAWLER_SETTINGS', silent=True)

def runPresenter():    
    url = "http://127.0.0.1:5000"
    webbrowser.open_new(url)
    app.run()  

There are also two more methods here that I omitted - one of them connects to the database and the second method loads html template to display result. I repeat this until conditions are met or user stops the program ( what I am trying to implement ). There are also two other methods too - one get's initial link from the command line and the second valitated arguments - if arguments are invalid I won't run crawl() method.

Here is short version of crawler :

def crawl(initialLink, maxDepth):
#here I am setting initial values, lists etc

while not(depth >= maxDepth or len(pagesToCrawl) <= 0):

  #this is the main loop that stops when certain depth is 
  #reached or there is nothing to crawl
  #Here I am popping urls from url queue, parse them and 
  #insert interesting data into the database


parser.close()
sock.close()              
dataManager.closeConnection()

Here is the init file which starts those modules in threads:

import ResultsPresenter, MyThread, time, threading

def runSpider():

    MyThread.MyThread(target=initSpider).start()
    MyThread.MyThread(target=ResultsPresenter.runPresenter).start()


def initSpider(): 

    import Crawler
        import database.__init__
    import schemas.__init__
    import static.__init__
    import templates.__init__

    link, maxDepth = Crawler.getInitialLink()
    if link:
        Crawler.crawl(link, maxDepth)



killall = False

if __name__ == "__main__":    

global killall
runSpider()

while True:

    try:
        time.sleep(1)

    except:            

        for thread in threading.enumerate():
            thread.stop()

        killall = True
        raise
koleS
  • 1,263
  • 6
  • 30
  • 46

2 Answers2

3

Killing threads is not a good idea, since (as you already said) they may be performing some crucial operations on database. Thus you may define global flag, which will signal threads that they should finish what they are doing and quit.

killall = False

import time
if __name__ == "__main__":
    global killall
    runSpider()
    while True:
        try:
            time.sleep(1)
        except:
            /* send a signal to threads, for example: */
            killall = True
            raise

and in each thread you check in a similar loop whether killall variable is set to True. If it is close all activity and quit the thread.

EDIT

First of all: the Exception is rather obvious. You are passing target argument to __init__, but you didn't declare it in __init__. Do it like this:

class MyThread(threading.Thread):

    def __init__(self, *args, **kwargs):
        super(MyThread, self).__init__(*args, **kwargs)
        self._stop = threading.Event()

And secondly: you are not using my code. As I said: set the flag and check it in thread. When I say "thread" I actually mean the handler, i.e. ResultsPresenter.runPresenter or initSpide. Show us the code of one of these and I'll try to show you how to handle stopping.

EDIT 2

Assuming that the code of crawl function is in the same file (if it is not, then you have to import killall variable), you can do something like this

def crawl(initialLink, maxDepth):
    global killall
    # Initialization.
    while not killall and not(depth >= maxDepth or len(pagesToCrawl) <= 0):
        # note the killall variable in while loop!
        # the other code
    parser.close()
    sock.close()              
    dataManager.closeConnection()

So basically you just say: "Hey, thread, quit the loop now!". Optionally you can literally break a loop:

while not(depth >= maxDepth or len(pagesToCrawl) <= 0):
    # some code
    if killall:
        break

Of course it will still take some time before it quits (has to finish the loop and close parser, socket, etc.), but it should quit safely. That's the idea at least.

freakish
  • 54,167
  • 9
  • 132
  • 169
  • This seems to be a solution I am looking for, but still I don't know how can intercept a signal from command line and how to stop both threads in code. Can you provide some example? Thanks anyway – koleS Jun 25 '12 at 08:04
  • This will work. `try: except:` block without specifying `Exception` actually catches io exceptions (like CTRL+C) as well. That's why I `try` sleep (which does nothing, but we want constant check). As for stopping threads: it depends on how they are implemented. :) – freakish Jun 25 '12 at 08:07
  • actually it doesn't work - after pushing CTRL+C KeyInterrupt is raised, but both modules are still running. Btw. can u explain meaning of 'global' when applied to a variable? It is seen by all modules in my packet or what? – koleS Jun 25 '12 at 08:17
  • @koleS `global` means that a variable is, well... global. :) In the sense that not only other functions can see it, but other threads as well. As I said: this is only for notification. You need to implement thread closing manually. Post a code of one of your threads and I will try to show you how to close it. – freakish Jun 25 '12 at 08:39
  • I showed you parts of resultsPresenter and crawler in first post. – koleS Jun 25 '12 at 09:52
  • @koleS Good, I've update my answer once again. I hope it is clear now. :) – freakish Jun 25 '12 at 10:00
  • Yes I figured it out as well, however there are so problems for me. First is how to stop another part - Flask application? There is no visible infinite loop for me to break - I don't know how to stop it. And the second problem is about what you just said "It will quit after loop is finish" - but one iteration of while loop may have to process for example 15000 url addresses - I don't want to wait for that to finish. – koleS Jun 25 '12 at 10:02
  • @koleS I don't know anything about Flask, so I won't help you, but there must be some kind of built-in stopping mechanism. As for processing 15000 urls... well, if you can't alter the iteration, then you won't be able to stop the process safely, unfortunetly. But can't you split these 15000 urls into 1000 groups of 15 urls? And iterate on that group? I'm talking about nested `for/while` so you can put `killall` flag inside (and literally break the inner loop). It is always a good idea to split big tasks into smaller ones. :) – freakish Jun 25 '12 at 10:10
1

Try this:

ps aux | grep python

copy the id of the process you want to kill and:

kill -3 <process_id>

And in your code (adapted from here):

import signal
import sys
def signal_handler(signal, frame):
        print 'You killed me!'
        sys.exit(0)
signal.signal(signal.SIGQUIT, signal_handler)
print 'Kill me now'
signal.pause()
Community
  • 1
  • 1
luke14free
  • 2,529
  • 1
  • 17
  • 25
  • This could be ok as well, but I am getting following error : signal.signal(signal.SIGKILL, signal_handler) RuntimeError: (22, 'Invalid argument') – koleS Jun 25 '12 at 08:39
  • still doesn't work for me or I don't know how to use it, could you maybe provide some more detailed example? – koleS Jun 25 '12 at 10:34