15

What is the recommended structure to write validate functions with many conditions? See these two examples. The first looks ugly, the second isn't very common, perhaps because assert is generally used to rule out unexpected behaviour. Are there better alternatives?

def validate(val):
  if cond1(val):
    return False
  if cond2(val):
    return False
  if cond3(val)
    return False
  return True

Or

def validate(val):
  try:
    assert cond1(val)
    assert cond2(val)
    assert cond3(val)
    return True
  except AssertionError:
    return False
Willem
  • 3,043
  • 2
  • 25
  • 37
  • 23
    You should **not** use `assert` to validate data! It's used to validate the logic of your program: you use it to test for conditions which should _never_ happen if your program logic is correct. If a program raises `AssertionError` that means your code is wrong & it needs debugging. – PM 2Ring Dec 09 '17 at 12:04
  • 11
    @PM2Ring Also, if code is optimised [assert statements are removed](https://stackoverflow.com/questions/1273211/disable-assertions-in-python). – Peter Wood Dec 09 '17 at 12:15
  • @PeterWood Indeed! I actually added that info to my answer before I saw your comment. ;) – PM 2Ring Dec 09 '17 at 12:21
  • If you are writing code that will be maintained by others I would use the first as it is eminently readable. Poetry in code is intellectually satisfying and shows a sharp mind but may come back to haunt. – copper.hat Dec 09 '17 at 18:19
  • 3
    @copper.hat "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" — Brian Kernighan, _The Elements of Programming Style_, 2nd edition, chapter 2. – PM 2Ring Dec 10 '17 at 06:47
  • @PM2Ring: I know it. Have been bitten by it. Yet stumble to understand what I wrote a month ago. – copper.hat Dec 11 '17 at 15:59

3 Answers3

41

A compact way to write that function is to use any and a generator expression:

def validate(val):
    conditions = (cond1, cond2, cond3)
    return not any(cond(val) for cond in conditions)

The any and all functions short-circuit, so they'll stop testing as soon as they have a definite result, i.e., any stops as soon as it hits a True-ish value, all stops as soon as it hits a False-ish value, so this form of testing is quite efficient.

I should also mention that it's much more efficient to pass a generator expression like this to all / any than a list comprehension. Because all / any stop testing as soon as they get a valid result, if you feed them from a generator then the generator will stop too, thus in the above code if cond(val) evaluates to a True-ish value no further conditions will be tested. But if you pass all / any a list comprehension, eg any([cond(val) for cond in conditions]) the whole list has to be be built before all / any can even start testing.


You haven't shown us the internal structure of your cond functions, but you did mention assert in your question, so I feel that the following remarks are in order here.

As I mentioned in the comments, assert should not be used to validate data, it's used to validate program logic. (Also, assertion-handling can be disabled via an -O command line option). The correct Exception to use for data with invalid values is ValueError, and for objects that are the wrong type, use TypeError. But bear in mind that exceptions are designed to handle situations that are exceptional.

If you expect a lot of malformed data then it's generally more efficient to use if based logic than exceptions. Python exception-handling is quite fast if the exception isn't actually raised, in fact it's faster than the equivalent if based code. However, if the exception is raised say more than 5-10% of the time, then the try...except based code will be noticeably slower than the if based equivalent.

Of course, sometimes using exceptions is the only sensible option, even though the situation isn't all that exceptional. A classic example is when you're converting a collection of numeric strings to actual numeric objects, so that strings that represent integers get converted to integer objects, other numeric strings get converted to floats, and other strings get left as strings. The standard way to do this in Python involves using exceptions. For example:

def convert(s):
    ''' Convert s to int or float, if possible '''
    try:
        return int(s)
    except ValueError:
        try:
            return float(s)
        except ValueError:
            return s

data = ['42', 'spam', '2.99792458E8']
out = [convert(u) for u in data]
print(out)
print([type(u) for u in out])

output

[42, 'spam', 299792458.0]
[<class 'int'>, <class 'str'>, <class 'float'>]

Using "Look Before You Leap" logic here is possible here, but it makes the code more complicated because you need to deal with possible minus signs and scientific notation.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • 1
    Of course, these are not fully equivalent, because you don't short-circuit. If `cond1` is `not xs` and `cond2` is `xs[0] == 'bad'`, then passing `[]` for `xs` will result in an error. – wchargin Dec 09 '17 at 22:46
  • 1
    `any` might short-circuit, but the conditions are already evaluated when constructing the tuple, so it doesn't really short-circuit the evaluation like usual logic – This company is turning evil. Dec 09 '17 at 23:14
  • 4
    `cond1` , `cond2`, and `cond3` are functions. This code is equivalent to the original in terms of which functions will be called. – Theodore Norvell Dec 10 '17 at 00:54
  • 2
    @wchargin Look closer. First of all, `any` and `all` *do* both short-circuit. Second, the tuple contains the *functions*, not the results of calling those functions. Additionally, the argument passed to `any` is a generator. The functions are not invoked until they is reached by iterating through the generator, and the generator is advanced internally by the `any` and `all` functions (which stop advancing as soon as they return). – jpmc26 Dec 10 '17 at 02:28
  • "But bear in mind that exceptions are designed to handle situations that are exceptional." This is a common platitude that I continue to find more and more meaningless. What constitutes "exceptional"? Usually, what is actually meant is that they are intended for situations where *some failure* has occurred. – jpmc26 Dec 10 '17 at 02:34
  • @Kroltan No conditions were harmed in the construction of that tuple. :) It just contains function objects, the functions aren't called until the generator expression is executed, and as soon as one of the `cond` functions evaluates to True no further conditions are tested. – PM 2Ring Dec 10 '17 at 06:14
  • @jpmc26 Fair call. Above I stated that if the exception is raised more than 5-10% of the time then using exceptions is slower than using `if`. Admittedly, that's somewhat vague and broad. I _thought_ I had `timeit` code in my personal collection that I could link as evidence, but I don't, although it shouldn't be too hard to find some here on SO. – PM 2Ring Dec 10 '17 at 06:28
  • 1
    @jpmc26 (cont) OTOH, there's [this `timeit` test](https://stackoverflow.com/a/35451912/4014959) which compares the speed of `in` vs `get` for updating a dict used as a cache. `get` works by catching the `KeyError` exception, but it's doing it "under the hood", so it can work a little more efficiently than pure Python code. – PM 2Ring Dec 10 '17 at 06:28
  • 1
    @jpmc26: You're totally right. My mistake. (I missed that the `cond_i` were functions, not booleans; the rest is evident.) Thanks. – wchargin Dec 10 '17 at 18:04
5
def valid(value):
    return (is_this(value)
            and that(value)
            and other(value))

and operator exhibits "short-circuit" behavior in Python.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • 2
    This will return `True` when `all` conditions evaluate as `True`. But it should return `False` if `any` conditions evaluate `True`. – Peter Wood Dec 09 '17 at 12:44
  • 2
    @PeterWood yes, this is the intent. the conditions are reversed (look at the names). To use OP's conditions and names: `return not (cond1(val) or cond2(val) or cond3(val))` I find my variant more readable. – jfs Dec 09 '17 at 12:50
2

The first way is much better. It can be prettified a little bit using any():

def validate_conditions(value):
    return not any((condition(value) for condition in conditions))
Evya
  • 2,325
  • 3
  • 11
  • 22
  • 1
    This will return `True` when `all` conditions evaluate as `True`. But it should return `False` if `any` conditions evaluate `True`. – Peter Wood Dec 09 '17 at 12:43
  • 3
    What Peter said. Also, it's _much_ more efficient to pass a generator expression to `all` / `any` than a list comprehension. `all` / `any` stop testing as soon as they get a valid result, so if you feed them from a generator then the generator will stop too. But if you pass them a list comp, the whole list has to be be built before `all` / `any` can even start testing. – PM 2Ring Dec 09 '17 at 12:49
  • Thank you for clarifying – Evya Dec 09 '17 at 12:51
  • Also, generator expressions are one of the few things where being less implicit is encouraged in Python. `((...))` can be replaced by `(...)`, or in other words you don't need the braces if a generator expression is the only argument. – Graipher Dec 10 '17 at 08:53