0

My code is processing data files that contains integer values as strings terminating with a .0. For example, 1234 is "1234.0". These strings should be converted to integers, which is simple enough by first converting to a float and then an int:

foo = "1234.0"
foo = int(float(foo))

There is a possibility that the string could be "nan" however, in which case this code will throw an error when converting to int:

ValueError: cannot convert float NaN to integer

These values should be changed to None. I see two approaches to handling this:

foo = "nan"
try:
    foo = int(float(foo))
except ValueError as e:
    if 'cannot convert float NaN to integer' in str(e):
        foo = None
    else:
        raise

or

import math

foo = "nan"
foo = float(foo)
if math.isnan(foo):
    foo = None
else:
    foo = int(foo)

As exceptions as flow control is considered a good practice in Python I would usually favour something like the first option. However, this approach of explicitly checking the message of the exception as using that as flow control (i.e, if this message then do this otherwise raise) feels intuitively 'dirty'.

Is checking for specific error messages as a form of flow control generally considered a good or bad practice in Python and why?

Duck Hunt Duo
  • 299
  • 1
  • 11
  • Not particularly good or bad. I think that generally, exceptions are used if they are a rare occurrences (exceptional), otherwise an if-else statement. – 9769953 Feb 11 '20 at 10:17
  • 2
    This will be a lot of fun if you ever have to internationalize your application, like supporting Spanish or French in the error message – ChatterOne Feb 11 '20 at 10:23
  • @ChatterOne, very fair point, although in my particular case I can say with 100% certainty that we'll never be using a language other than English. Could be very relevant to other people seeking an answer to this question though. – Duck Hunt Duo Feb 11 '20 at 10:31
  • @OP Do you want your specific case answered, or do you prefer a general answer (as per your question title), where your code serves as an example. Because the two may very well differ. – 9769953 Feb 11 '20 at 10:48

2 Answers2

1

This is bad practice, because it tightly couples your code to the exact error message, which is not specified in the language documentation. If a new version of Python changes this error message even slightly, your code will break.

If you want to treat non-numeric strings differently to non-integer numeric strings, you should catch the exceptions separately:

def str_to_int(s):
    try:
        float_value = float(s)
    except ValueError:
        print('not a float')
        raise
    else:
        try:
            return int(float_value)
        except ValueError:
            print('not an int')
            return None
kaya3
  • 47,440
  • 4
  • 68
  • 97
  • 1
    Can you provide a reference for your answer? – Daniel Lima Feb 11 '20 at 10:18
  • 1
    @DanielLima Which claim do you think needs to be supported by a reference? – kaya3 Feb 11 '20 at 10:21
  • Thanks for your answer. I think your code snippet misses one of the key issues though, which is that I'd only like None to be returned in the event that the string is "nan". With this code, None will be returned for *any* ValueError (if the value of the string is set to "monkey" for example), whereas I'd really prefer to see an error raised in the result that our data files contain any unexpected values so we can consider it on a case-by-case basis. Is there a better approach for that? – Duck Hunt Duo Feb 11 '20 at 10:29
  • @DuckHuntDuo Of course you can change one or both of the `return None` statements to do whatever you want instead. The point is that by catching them separately you can handle them separately. I've edited to show this; if you don't need to do anything in between catching and re-raising the first `ValueError` then you could get rid of the outer `try` block entirely. – kaya3 Feb 11 '20 at 10:34
1

Generally, you should be validating your inputs. Avoid trying to cast something you don't know what is. That said, if you know how your input may look like (either integers terminated with .0, or 'nan'), then you should validate just that. Example:

import re
from typing import Optional


def validate(maybe_number: str) -> Optional[int]:
    # following regex matches string with at least one digit followed by '.0'
    number_pattern = re.compile(r"^\d+\.0$")

    if number_pattern.match(maybe_number):
        return int(maybe_number)

    if maybe_number == "nan":
        return None

    raise ValueError("some useful message")
Libor
  • 79
  • 2
  • 10