4

I have a Python script that contains a big loop reading a file and doing some stuff (I am using several packages like urllib2, httplib2 or BeautifulSoup).

It looks like this :

try:
    with open(fileName, 'r') as file :
        for i, line in enumerate(file):
            try:
                # a lot of code
                # ....
                # ....
            except urllib2.HTTPError:
                print "\n >>> HTTPError"
            # a lot of other exceptions
            # ....
            except (KeyboardInterrupt, SystemExit):
                print "Process manually stopped"
                raise
            except Exception, e:
                print(repr(e))
except (KeyboardInterrupt, SystemExit):
    print "Process manually stopped"
    # some stuff

The problem is that the program stops when I hit Ctrl+C but it is not caught by any of my two KeyboardInterrupt exceptions though I am sure it is currently in the loop (and thus at least inside the big try/except).

How is that possible? At first I thought it was because one of the packages I'm using doesn't handle the exceptions correctly (like by using an "except:" only) but if it were the case, my script wouldn't stop. But the script DOES stop and it should be caught by at least one my two except, right?

Where am I wrong?

Thanks in advance!

EDIT:

With adding a finally: clause after the try-except and printing the traceback in both try-except blocks, it usually displays None when I hit Ctrl+C, but I once managed to get this (seems that it comes from urllib2, but I don't know if it is the reason why I can't catch a KeyboardInterrupt):

Traceback (most recent call last):

File "/home/darcot/code/Crawler/crawler.py", line 294, in get_articles_from_file
  content = Extractor(extractor='ArticleExtractor', url=url).getText()
File "/usr/local/lib/python2.7/site-packages/boilerpipe/extract/__init__.py", line 36, in __init__
  connection  = urllib2.urlopen(request)
File "/usr/local/lib/python2.7/urllib2.py", line 126, in urlopen
  return _opener.open(url, data, timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 391, in open
  response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 409, in _open
  '_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 369, in _call_chain
  result = func(*args)
File "/usr/local/lib/python2.7/urllib2.py", line 1173, in http_open
  return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1148, in do_open
  raise URLError(err)
URLError: <urlopen error [Errno 4] Interrupted system call>
Josh Correia
  • 3,807
  • 3
  • 33
  • 50
Thematrixme
  • 318
  • 1
  • 4
  • 14
  • The code what you have put seems pretty clean. I tried executing a nested Try-Catch too. Works fine. Maybe the culprit is inside the lots of code part. `NOTE:` Avoid Nested TRY-CATCH. – Yogeesh Seralathan Aug 12 '14 at 14:41
  • Why are you catching the KeyboardInterrupt Exception twice? Which python version are you using? 2.7.8? I tried something similar to your example with python 3.4 and it works perfectly... – flammi88 Aug 12 '14 at 14:42
  • @flammi88 I caught it twice because I wasn't sure if it would be caught by the "except Exception, e". But it shouldn't, and the big try-except should be enough to catch the exception. – Thematrixme Aug 12 '14 at 14:45
  • And I'm using Python 2.7 – Thematrixme Aug 12 '14 at 14:45
  • Do you get any traceback log at all? – Yoel Aug 12 '14 at 14:49
  • And I know that it should work this way. I tried a simpler version of this, without my whole code in the loop and it worked fine. Thus the issue seems to come from one of the packages I use, but in order to find where the issue is, I'd like to know what can cause it. Because I can't even think of a way that could stop the script without raising this exception... – Thematrixme Aug 12 '14 at 14:50
  • @Yoel No, I just see ^C on the terminal and the program stops. – Thematrixme Aug 12 '14 at 14:52
  • I have tried out the code you provided with python 2.7.8 and catch the KeyboardInterrupt Exception works nicely. I replaced the first 'a lot of code' comment with a pass statement to get it working and added a few additional prints to see which except gets fired... From my point of view your problem exits in the block you left out... – flammi88 Aug 12 '14 at 14:54
  • @flammi88 Yes that's what I think too, but how can that even be possible that a Ctrl-C does stop the program without raising this exception? – Thematrixme Aug 12 '14 at 14:59
  • I can imagine cases, but it's hard to say without any information about that what you are trying to do in the 'a lot of code' section? Some sort of multi-threading maybe? (I am not sure how python react in these cases, I have never done any multi threading in python before...) – flammi88 Aug 12 '14 at 15:03
  • Try adding a `finally` clause and print the traceback there... – Yoel Aug 12 '14 at 15:06
  • @flammi88 There is no multithreading (not that I am aware of at least, there might be in the packages I use). Basically I just loop in a file that contains urls of newspaper articles, and for each url I extract the article (using packages like BeautifulSoup, newspaper, boilerpipe, urllib2, httplib2) and insert it into a database. There is no weird thing I think. – Thematrixme Aug 12 '14 at 15:14
  • @Yoel Should I put the finally after both excepts? – Thematrixme Aug 12 '14 at 15:15
  • Inserting it in the inner `try-except` block should suffice – Yoel Aug 12 '14 at 15:28
  • @Yoel I managed to get something but since I got it only once I am not sure it is the cause to my issue. Please take a look as I edited the post. – Thematrixme Aug 12 '14 at 15:44
  • I am not sure if you problem is related to the urllib2 stack-trace. I assume that your problem is caused by one of your non pure python packages... If you really want to, you may be able to debug this with gdb --args python here further. Gdb will drop a shell when you hit ctrl-c and you can get a C-Backtrace when you enter bt there... (I have to admit that this might be overkill to solve your problem, but I have no better idea) – flammi88 Aug 12 '14 at 15:56
  • One further question: On which plattform are you developing this? Linux/Windows/Mac? I assumed Linux in my previous comment on other platforms debugging python might work another way... – flammi88 Aug 12 '14 at 15:59
  • How have you printed the traceback? Have you invoked `traceback.print_stack()`? – Yoel Aug 12 '14 at 16:03
  • @Yoel Yes I used that and it most of the time print just `None`, I only managed once to get something else (what I added to the post), but as @flammi88 suggested that seems to be a JPype issue. – Thematrixme Aug 12 '14 at 19:56

2 Answers2

3

I already suggested in my comments to the question, that this problem is likely to be caused by the code section that is left out in the question. However, the exact code should not be relevant, as Python should normally throw a KeyboardInterrupt exception, when Python code gets interrupted by Ctrl-C.

You mentioned in the comments that you use the boilerpipe Python package. This Python package uses JPype to create the language binding to Java... I can reproduce your problem with the following Python program:

from boilerpipe.extract import Extractor
import time

try:
  for i in range(10):
    time.sleep(1)

except KeyboardInterrupt:
  print "Keyboard Interrupt Exception"

If you interrupt this program with Ctrl-C the exception is not thrown. It seems that the program is terminated immediately leaving the Python interpreter with no chance to throw the exception. When the import of boilerpipe is removed, the problem disappears...

A debugging session with gdb indicates that a bulk amount of threads got started by Python if boilerpipe is imported:

gdb --args python boilerpipe_test.py
[...]
(gdb) run
Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffef62b700 (LWP 3840)]
[New Thread 0x7fffef52a700 (LWP 3841)]
[New Thread 0x7fffef429700 (LWP 3842)]
[New Thread 0x7fffef328700 (LWP 3843)]
[New Thread 0x7fffed99a700 (LWP 3844)]
[New Thread 0x7fffed899700 (LWP 3845)]
[New Thread 0x7fffed798700 (LWP 3846)]
[New Thread 0x7fffed697700 (LWP 3847)]
[New Thread 0x7fffed596700 (LWP 3848)]
[New Thread 0x7fffed495700 (LWP 3849)]
[New Thread 0x7fffed394700 (LWP 3850)]
[New Thread 0x7fffed293700 (LWP 3851)]
[New Thread 0x7fffed192700 (LWP 3852)]

gdb session without the boilerpipe import:

gdb --args python boilerpipe_test.py
[...]
(gdb) r
Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7529533 in __select_nocancel () from /usr/lib/libc.so.6
(gdb) signal 2
Continuing with signal SIGINT.
Keyboard Interrupt Exception
[Inferior 1 (process 3904) exited normally 

So I assume that your Ctrl-C signal gets handled in a different thread or that jpype does other odd things that breaks the handling of Ctrl-C.

EDIT: As a possible workaround you can register a signal handler that catches the SIGINT signal that the process receives when you hit Ctrl-C. The signal handler gets fired even if boilerpipe and JPype are imported. This way you will get notified when the user hits Ctrl-C and you will be able to handle that event at a central point in your program. You can terminate the script if you want to in this handler. If you don't, the script will continue running where it was interrupted once the signal handler function returns. See the example below:

from boilerpipe.extract import Extractor
import time
import signal
import sys

def interuppt_handler(signum, frame):
    print "Signal handler!!!"
    sys.exit(-2) #Terminate process here as catching the signal removes the close process behaviour of Ctrl-C

signal.signal(signal.SIGINT, interuppt_handler)

try:
    for i in range(10):
        time.sleep(1)
#    your_url = "http://www.zeit.de"
#    extractor = Extractor(extractor='ArticleExtractor', url=your_url)
except KeyboardInterrupt:
    print "Keyboard Interrupt Exception" 
Yoel
  • 9,144
  • 7
  • 42
  • 57
flammi88
  • 381
  • 2
  • 14
  • I think it is the only reasonable answer, thank you very much! However I have now to face a tricky issue, do you have any idea how to find a workaround? Maybe with defining another key combination that will allow me to break the loop and exit properly the program and that won't be handled weirdly by JPype? – Thematrixme Aug 12 '14 at 20:02
  • Please see if my updated answer can help you. Without looking into the JPype code thats the only solution I am aware of... I hope this can help you. – flammi88 Aug 12 '14 at 20:11
  • That definitely helped me, thank you so much! The workaround works perfectly. – Thematrixme Aug 13 '14 at 08:22
0

Most Likely you are issuing CTRL-C when your script is outside the try block, and therefore is't not capturing the signal.

Drudi
  • 31
  • 4
  • I am definitely sure that I hit Ctrl-C while the program is in the loop (that's not difficult to know) and since my loop is inside a try block that can't be the right answer. – Thematrixme Aug 12 '14 at 19:55