1

The script I am writing should exit back to the shell prompt with a helpful message if the data to be processed is not exactly right. The user should fix the problems flagged until the script is happy and no longer exits with error messages. I am developing the script with TTD, so I write a pytest test before I write the function.

The most heavily up-voted answer here suggests that scripts be edited by calling sys.exit or raising SystemExit.

The function:

def istext(file_to_test):
    try:
        open(file_to_test).read(512)
    except UnicodeDecodeError:
        sys.exit('File {} must be encoded in UTF-8 (Unicode); try converting.'.format(file_to_test))

passes this test (where _non-text.png is a PNG file, i.e., not encoded in UTF-8):

def test_istext():
    with pytest.raises(SystemExit):
        istext('_non-text.png')

However, the script continues to run, and statements placed after the try/except block execute.

I want the script to completely exit every time so that the user can debug the data until it is correct, and the script will do what it is supposed to do (which is to process a directory full of UTF-8 text files, not PNG, JPG, PPTX... files).

Also tried:

The following also passes the test above by raising an exception that is a sub-class of SystemExit, but it also does not exit the script:

def istext(file_to_test):
    class NotUTF8Error(SystemExit): pass
    try:
        open(file_to_test).read(512)
    except UnicodeDecodeError:
        raise NotUTF8Error('File {} must be UTF-8.'.format(file_to_test))
Tom Baker
  • 683
  • 5
  • 17

4 Answers4

1

The try...except block is for catching an error and handling it internally. What you want to do is to re-raise the error.

def istext(file_to_test):
try:
    open(file_to_test).read(512)
except UnicodeDecodeError:
    print(('File {} must be encoded in UTF-8 (Unicode); try converting.'.format(file_to_test)))
    raise

This will print your message, then automatically re-raise the error you've caught.

Instead of just re-raising the old error, you might want to change the error type as well. For this case, you specify raise further, e.g.:

raise NameError('I'm the shown error message')
Sudix
  • 350
  • 2
  • 15
  • My first instinct had been to create a custom exception along the lines of `class NotUTF8Error(SystemExit):`. So you are suggesting that after `except UnicodeDecodeError:` I would `raise NotUTF8Error("Some message")`? – Tom Baker Sep 09 '17 at 18:54
  • Exactly. Or you specify in the definition of your error class already what the text is going to say, so that no re-raising is necessary. – Sudix Sep 09 '17 at 19:26
  • I tried the second suggestion (see amended question above), but like my original attempt, it passes the pytest but does not exit the script. The first suggestion (re-raising `UnicodeDecodeError` after printing the message) passes a modified pytest but without exiting the script. – Tom Baker Sep 10 '17 at 06:30
  • In that case, it is most likely that either Python automatically detects the encoding as Unicode, or that the symbol you copied into it isn't actually unicode. To force Python to use a specific encoding in the open command, use: open("Filename",encoding="UTF-8") ; Now, to make sure your file has unicode in it, copy this symbol into it: ⎌ – Sudix Sep 10 '17 at 09:35
  • Thank you, @Sudix, that's an interesting suggestion. In this case, however, my intention is to ensure that every visible file in the working directory is a text file. But since "text file" is hard to test for in a UTF-8 environment, I want to use the fact that `open(file).read(512)` will raise an exception for PNG, JPG, PPTX, DOCX... files as a simple test. – Tom Baker Sep 10 '17 at 11:02
  • For regular files, reading the extension, or if missing, the file signature will give you their type. If it's missing, things get tricky however... – Sudix Sep 10 '17 at 12:38
  • Never use `print` in a real application. You can use `logging.log()` or something else. – ADR Sep 12 '17 at 17:42
  • Why not? Is it like C's printf that's riddled with security risks? – Sudix Sep 12 '17 at 20:37
1

You can use raise Exception from exception syntax:

class MyException(SystemExit):
    pass


def istext(file_to_test):
    try:
        open(file_to_test).read(512)
    except UnicodeDecodeError as exception:
        raise MyException(f'File {file_to_test} must be encoded in UTF-8 (Unicode); try converting.') \
            from exception 

I this case you doesn't change original error message and add your own message.

Tom Baker
  • 683
  • 5
  • 17
ADR
  • 1,255
  • 9
  • 20
  • Hmm, I could not get this example to work, even without the error message. – Tom Baker Sep 10 '17 at 09:47
  • After declaring the class `class NotUTF8Error(SystemExit): pass`, and after `except UnicodeDecodeError as exception:`, `raise NotUTF8Error from exception` gets a `SyntaxError`. – Tom Baker Sep 10 '17 at 09:55
  • Never mind - I got this to work, only now the pytest is failing with `NameError: name 'NotUTF8Error' is not defined`. – Tom Baker Sep 10 '17 at 10:14
  • Okay - even the pytest works now, but with `with pytest.raises(SystemExit)` and not, as I had expected with `with pytest.raises(NotUTF8Error)` - even though `NotUTF8Error` was declared with `class NotUTF8Error(SystemExit): pass`. The test fails with `NameError: name 'NotUTF8Error' is not defined`. Though I do not quite understand why, it all works now! – Tom Baker Sep 10 '17 at 10:57
  • I would like to upvote this answer, but I was not able to get it to work with the `'f'File {file_to_test}...'` syntax - only with `'File {}...'.format(file)`, and I could not find any documentation for strings prefixed with `f`. – Tom Baker Sep 10 '17 at 11:10
  • `f"{}"` syntax works only in Python 3.6. You have `python3.x` tag and I thought that you use the latest version. Have you resolved all your problems? – ADR Sep 10 '17 at 21:07
  • Yes - my apologies - I couldn't find this solution and do indeed have only 3.5.2. I have removed the acceptance from my solution and am accepting this as the solution. Many thanks! – Tom Baker Sep 12 '17 at 18:46
  • I'm sorry for the confusion, but if I follow your example completely and create `NotUTF8Error` as a subclass of `Exception`, my pytest breaks. My pytest works if I create it as a subclass of `SystemExit`. – Tom Baker Sep 12 '17 at 19:13
  • Oh right... My mistake. You really need to inherit the `SystemExit` class because you except the class in `with pytest.raises(SystemExit):`. In fact, I just have wanted to show you the `raise Exception from exception` construction. https://docs.python.org/3/reference/simple_stmts.html#raise Is everything all right now, do you need help? – ADR Sep 13 '17 at 09:19
  • I have edited the code above to change `Exception` to `SystemExit`. I believe it is perfect now, but my edit needs to be peer-reviewed before it is visible to others. When my edit is accepted, I will accept this as the answer. Thank you very much for the links and explanation - much appreciated! – Tom Baker Sep 13 '17 at 19:11
0

You problem is not how to exit a program (sys.exit() works fine). You problem is that your test scenario is not raising a UnicodeDecodeError.

Here's a simplified version of your example. It works as expected:

import pytest
import sys

def foo(n):
    try:
        1/n
    except ZeroDivisionError as e:
        sys.exit('blah')

def test_foo():
    # Assertion passes.
    with pytest.raises(SystemExit):
        foo(0)
    # Assertion fails: "DID NOT RAISE <type 'exceptions.SystemExit'>"
    with pytest.raises(SystemExit):
        foo(9)

Add some diagnostic printing to your code to learn more. For example:

def istext(file_to_test):
    try:
        content = open(file_to_test).read(512)
        # If you see this, no error occurred. Maybe your source
        # file needs different content to trigger UnicodeDecodeError.
        print('CONTENT len()', len(content))
    except UnicodeDecodeError:
        sys.exit('blah')
    except Exception as e:
        # Maybe some other type of error should also be handled?
        ...
FMc
  • 41,963
  • 13
  • 79
  • 132
  • This helpfully clarifies that pytest is not testing whether `1/0` raises `ZeroDivisionError`, but whether `sys.exit` raises `SystemExit`. Thanks! – Tom Baker Sep 10 '17 at 10:49
0

In the end, what worked is similar to what @ADR proposed, with one difference: I was not able to get the formatted string syntax shown above to work correctly (f'File {file_to_test} must...'), nor could I find documentation of the f prefix for strings.

My slightly less elegant solution, then, for the (renamed) function:

def is_utf8(file):
    class NotUTF8Error(SystemExit): pass
    try:
        open(file).read(512)
    except UnicodeDecodeError as e:
        raise NotUTF8Error('File {} not UTF-8: convert or delete, then retry.'.format(file)) from e

passes the pytest:

def test_is_utf81():
    with pytest.raises(SystemExit):
        is_utf8('/Users/tbaker/github/tombaker/mklists/mklists/_non-text.png')
Tom Baker
  • 683
  • 5
  • 17
  • Are you seriously? You don't like my solution anymore because I use one simple feature of Python 3.6? https://docs.python.org/3/whatsnew/3.6.html https://www.python.org/dev/peps/pep-0498/ – ADR Sep 12 '17 at 17:37
  • @ADR thank you for the reference - My apologies - I looked for that feature but didn't find it; it didn't work for me because I have Python 3.5.2. I'm removing the acceptance and accepting your answer! – Tom Baker Sep 12 '17 at 18:44