Why is "if not (a and b)" faster than "if not a or not b"?

Question

On a whim, I recently tested these two methods with timeit, to see which evaluation method was faster:

import timeit

"""Test method returns True if either argument is falsey, else False."""

def and_chk((a, b)):
    if not (a and b):
        return True
    return False

def not_or_chk((a, b)):
    if not a or not b:
        return True
    return False

...and got these results:

 VALUES FOR a,b ->      0,0         0,1         1,0         1,1
        method
    and_chk(a,b)    0.95559     0.98646     0.95138     0.98788
 not_or_chk(a,b)    0.96804     1.07323     0.96015     1.05874
                                            ...seconds per 1,111,111 cycles.

The difference in efficiency is between one and nine percent, always in favour of if not (a and b), which is the opposite of what I might expect since I understand that if not a or not b will evaluate its terms (if not a and then if not b) in order, running the if block once it encounters a true expression (and there are no and clauses). In contrast, the and_chk method needs to evaluate both clauses before it can return any result to the if not.. that wraps it.

The timing results, however, disprove this understanding. How, then, is the if condition being evaluated? I am perfectly aware of the fact that this degree of microoptimization is practically, if not completely, pointless. I just want to understand how Python is going about it.

For completion's sake, this is how I set up timeit...

cyc = 1111111

bothFalse_and = iter([(0,0)] * cyc)
zeroTrue_and = iter([(1,0)] * cyc)
oneTrue_and = iter([(0,1)] * cyc)
bothTrue_and = iter([(1,1)] * cyc)

bothFalse_notor = iter([(0,0)] * cyc)
zeroTrue_notor = iter([(1,0)] * cyc)
oneTrue_notor = iter([(0,1)] * cyc)
bothTrue_notor = iter([(1,1)] * cyc)

time_bothFalse_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothFalse_and as tups, and_chk')
time_zeroTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import zeroTrue_and as tups, and_chk')
time_oneTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import oneTrue_and as tups, and_chk')
time_bothTrue_and = timeit.Timer('and_chk(next(tups))', 'from __main__ import bothTrue_and as tups, and_chk')

time_bothFalse_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothFalse_notor as tups, not_or_chk')
time_zeroTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import zeroTrue_notor as tups, not_or_chk')
time_oneTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import oneTrue_notor as tups, not_or_chk')
time_bothTrue_notor = timeit.Timer('not_or_chk(next(tups))', 'from __main__ import bothTrue_notor as tups, not_or_chk')

...then ran each timeit.Timer(..) function with .timeit(cyc) to get the results posted.

`if not (a and b)` there are two operators which are `not` and `and`. — Malik Brahimi, Apr 10 '15 at 00:42
`if not a or not b` there are three operators `not` and `or` and `not`. — Malik Brahimi, Apr 10 '15 at 00:43
They should both short circuit in the same way: `and` should stop if `a` is `False` without evaluating `b`, and `or` should stop if `not a` is true, without evaluating `not b`. — murgatroid99, Apr 10 '15 at 00:43
@MalikBrahimi I see. I didn't think it would evaluate operators once the statement was certain to run. It's sort of obvious, if you expect every operator to be considered instead of 'ignoring irrelevant' instructions once truth or falsehood is determined. :y — Augusta, Apr 10 '15 at 00:45
Never do `if x then True else False`. You could have written just `return not (a and b)` or `return not a or not b`. — Mephy, Apr 10 '15 at 00:45
@mephy Fair point, and normally I would, but I wanted to make the example for this as explicit as possible. — Augusta, Apr 10 '15 at 00:47
@HuuNguyen I'm not sure how to answer that, exactly; I'm sort of new... If I told you my IDE was IDLE, would that answer your question? :s — Augusta, Apr 10 '15 at 00:51
Python is a language specification, and there are implementations that interpret your source files and run it. Since you mentioned IDLE, I'm assuming you're using CPython, a Python interpreter written in C. This might help someone if they wanted to dive deeper into the internals to make sense of your results. — huu, Apr 10 '15 at 00:53
The `dis` module, which displays the Python bytecode (sort of a high-level assembly language, if you're not familiar) may be helpful. Try `import dis; print(dis.dis(f))` where `f` is your function. — Eli Rose, Apr 10 '15 at 01:07
FWIW, there's a fair bit of repetition in that `timeit` set up code! Reading highly repetitive code can be tricky, since on first glance it's not easy to know which bits are _supposed_ to be different, and which bits might be typos. :) The philosophy of avoiding such repetition is called [Don't Repeat Yourself (DRY)](http://en.wikipedia.org/wiki/Don%27t_repeat_yourself). For quick stuff, a little repetition is tolerable, but it's best avoided when practical in more serious code. — PM 2Ring, May 09 '15 at 11:04

skrrgwasme · Accepted Answer · 2015-05-08T16:29:47.007

TL;DR

The not_or_chk function requires two unary operations in addition to two jumps (in the worst case), while the and_chk function only has the two jumps (again, in the worst case).

Details

The dis module to the rescue! The dis module lets you take a look at the Python bytecode disassembly of your code. For example:

import dis

"""Test method returns True if either argument is falsey, else False."""

def and_chk((a, b)):
    if not (a and b):
        return True
    return False

def not_or_chk((a, b)):
    if not a or not b:
        return True
    return False

print("And Check:\n")
print(dis.dis(and_chk))

print("Or Check:\n")
print(dis.dis(not_or_chk))

Produces this output:

And Check:

  5           0 LOAD_FAST                0 (.0)
              3 UNPACK_SEQUENCE          2
              6 STORE_FAST               1 (a)
              9 STORE_FAST               2 (b)

  6          12 LOAD_FAST                1 (a)    * This block is the *
             15 JUMP_IF_FALSE_OR_POP    21        * disassembly of    *
             18 LOAD_FAST                2 (b)    * the "and_chk"     *
        >>   21 POP_JUMP_IF_TRUE        28        * function          *

  7          24 LOAD_GLOBAL              0 (True)
             27 RETURN_VALUE

  8     >>   28 LOAD_GLOBAL              1 (False)
             31 RETURN_VALUE
None
Or Check:

 10           0 LOAD_FAST                0 (.0)
              3 UNPACK_SEQUENCE          2
              6 STORE_FAST               1 (a)
              9 STORE_FAST               2 (b)

 11          12 LOAD_FAST                1 (a)    * This block is the *
             15 UNARY_NOT                         * disassembly of    *
             16 POP_JUMP_IF_TRUE        26        * the "not_or_chk"  *
             19 LOAD_FAST                2 (b)    * function          *
             22 UNARY_NOT
             23 POP_JUMP_IF_FALSE       30

 12     >>   26 LOAD_GLOBAL              0 (True)
             29 RETURN_VALUE

 13     >>   30 LOAD_GLOBAL              1 (False)
             33 RETURN_VALUE
None

Take a look at the two blocks of Python bytecode that I've marked with the asterisks. Those blocks are your two disassembled functions. Note that and_chk only has two jumps, and the calculations in the function are made while deciding whether or not to take the jump.

On the other hand, the not_or_chkfunction requires the not operation to be carried out twice in the worst case, in addition to the interpreter deciding whether or not to take the jump.

That's really fascinating! Thanks for showing me something like this. I hadn't thought of `not` as an operator (which, having had it pointed out earlier, I realize was a dumb thing to overlook), and it completely explains what is going on. More than that, though, I imagine I'm going to have a lot of fun with `dis`, so thanks for that, as well! :D — Augusta, Apr 10 '15 at 07:50

score 3 · Answer 2 · edited May 23 '17 at 12:09

I just noticed this question via your Meta SO question: Is it appropriate to share the results of my research toward solving my own minor questions?. As you mention in that question (and one of the tags on this question indicates), this sort of thing falls into the category of micro-optimization. Ideally, one shouldn't need to worry about such minor performance differences, and as Knuth says, premature optimization is the root of all evil. But I guess it is fun and instructive to investigate such matters, as it can give you a better feel for how Python works "under the hood".

Mephy's comment prompted me to see what the timing differences were for if-less versions of your functions. The results are interesting, IMHO. I also took the opportunity to streamline your testing procedure.

#!/usr/bin/env python

''' Do timeit tests on various implementations of NAND

    NAND returns True if either argument is falsey, else False.
    From https://stackoverflow.com/q/29551438/4014959
    Written by PM 2Ring 2015.04.09
'''

from timeit import Timer
import dis

def and_chk(a, b):
    return not (a and b)

def and_chk_if(a, b):
    if not (a and b):
        return True
    else:
        return False

def not_or_chk(a, b):
    return not a or not b

def not_or_chk_if(a, b):
    if not a or not b:
        return True
    else:
        return False


#All the functions
funcs = (
    and_chk,
    and_chk_if,
    not_or_chk,
    not_or_chk_if,
)

#Argument tuples to test the functions with
bools = (0, 1)
bool_tups = [(u, v) for u in bools for v in bools]


def show_dis():
    ''' Show the disassembly for each function '''
    print 'Disassembly'
    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        dis.dis(func)
    print


def verify():
    ''' Verify that the functions actually perform as intended '''
    print 'Verifying...'
    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        for args in bool_tups:
            print args, func(*args)
    print


def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    print 'Timing tests\nLoops = %d, Repetitions = %d' % (loops, reps)

    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        setup = 'from __main__ import %s' % fname
        for args in bool_tups:
            t = Timer('%s%s' % (fname, args), setup)
            r = t.repeat(reps, loops)
            r.sort()
            print args, r


show_dis()
verify()
time_test(loops=520000, reps=3)

output

Disassembly

and_chk
 13           0 LOAD_FAST                0 (a)
              3 JUMP_IF_FALSE            4 (to 10)
              6 POP_TOP             
              7 LOAD_FAST                1 (b)
        >>   10 UNARY_NOT           
             11 RETURN_VALUE        

and_chk_if
 16           0 LOAD_FAST                0 (a)
              3 JUMP_IF_FALSE            4 (to 10)
              6 POP_TOP             
              7 LOAD_FAST                1 (b)
        >>   10 JUMP_IF_TRUE             5 (to 18)
             13 POP_TOP             

 17          14 LOAD_GLOBAL              0 (True)
             17 RETURN_VALUE        
        >>   18 POP_TOP             

 19          19 LOAD_GLOBAL              1 (False)
             22 RETURN_VALUE        
             23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        

not_or_chk
 22           0 LOAD_FAST                0 (a)
              3 UNARY_NOT           
              4 JUMP_IF_TRUE             5 (to 12)
              7 POP_TOP             
              8 LOAD_FAST                1 (b)
             11 UNARY_NOT           
        >>   12 RETURN_VALUE        

not_or_chk_if
 25           0 LOAD_FAST                0 (a)
              3 UNARY_NOT           
              4 JUMP_IF_TRUE             8 (to 15)
              7 POP_TOP             
              8 LOAD_FAST                1 (b)
             11 UNARY_NOT           
             12 JUMP_IF_FALSE            5 (to 20)
        >>   15 POP_TOP             

 26          16 LOAD_GLOBAL              0 (True)
             19 RETURN_VALUE        
        >>   20 POP_TOP             

 28          21 LOAD_GLOBAL              1 (False)
             24 RETURN_VALUE        
             25 LOAD_CONST               0 (None)
             28 RETURN_VALUE        

Verifying...

and_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

and_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

not_or_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

not_or_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

Timing tests
Loops = 520000, Repetitions = 3

and_chk
(0, 0) [0.36773014068603516, 0.37793493270874023, 0.38387489318847656]
(0, 1) [0.36292791366577148, 0.37119913101196289, 0.37146902084350586]
(1, 0) [0.38673520088195801, 0.39340090751647949, 0.39670205116271973]
(1, 1) [0.38938498497009277, 0.39505791664123535, 0.40034103393554688]

and_chk_if
(0, 0) [0.4021449089050293, 0.40345501899719238, 0.41558098793029785]
(0, 1) [0.40686416625976562, 0.41213202476501465, 0.44800615310668945]
(1, 0) [0.4281308650970459, 0.42916202545166016, 0.43681907653808594]
(1, 1) [0.46246123313903809, 0.46759700775146484, 0.47544980049133301]

not_or_chk
(0, 0) [0.35435700416564941, 0.36368083953857422, 0.36867713928222656]
(0, 1) [0.35602092742919922, 0.35732793807983398, 0.36237406730651855]
(1, 0) [0.39550518989562988, 0.40660715103149414, 0.40977287292480469]
(1, 1) [0.4060060977935791, 0.4112389087677002, 0.41296815872192383]

not_or_chk_if
(0, 0) [0.4308779239654541, 0.44109201431274414, 0.45827698707580566]
(0, 1) [0.43600606918334961, 0.4370419979095459, 0.47623395919799805]
(1, 0) [0.48452401161193848, 0.48769593238830566, 0.49147915840148926]
(1, 1) [0.53107500076293945, 0.54048299789428711, 0.55434417724609375]

These timings were performed using Python 2.6.6 on a 2GHz Pentium 4 (single-core 32 bit) running Mepis 11 (a Debian family Linux distro).

You'll note that I've avoided using your next(tups) strategy to get the arguments for each function call and instead I'm passing the args directly, as constants. The time taken to perform next(tups) should be fairly constant, but it's best to eliminate such overheads, when practical, so that the reported time measurements more accurately reflect the performance of the code we're really interested in.

Also, it's usual to perform several repetitions of the timing loop and take the minimum value; FWIW, I generally do 3 to 5 reps. From the timeit docs:

Note

It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.

Your Meta post says that you want to perform & report on other micro-optimization experiments, so you might be interested in taking a look at some code I posted a few months ago that does time tests on various implementations of the factorial function.

I also noticed that you added the `else` block to the original code. Initially, I'd left this out on the basis of "if you're still asking, it's False" / "if it were anything else you'd have returned it by now," but was a bit dubious about because it seemed like something good (or at least 'better') practice would include. (["Don't bother!" says this other post.](http://stackoverflow.com/questions/1780384/) ;) ) Anyway, per the Meta post, thanks for your input! I'll take a close look at what you've posted here (esp. re: `next(tups)`!) and see how it effects some earlier tests. — Augusta, May 09 '15 at 22:48
@Augusta: That other post is a _slightly_ different case, though. It's arguing about the merits of `else: pass`, whereas here we have an `else` that's redundant due to the associated `if` block containing a `return`. — PM 2Ring, May 10 '15 at 12:02
@Augusta: If you look at the `dis` output of the `_chk_if` functions you'll see that they have **3** RETURN_VALUE commands. As well as the `True` and `False` returns there's also a `return None` added, since that's the default return for any Python function that doesn't have an explicit unconditional return value, and the interpreter isn't smart enough to figure out that the function must return via one branch or the other of the `if...else`. :) — PM 2Ring, May 10 '15 at 12:02
I'd figured that the other post was slightly different, but I was also assuming that if `else: pass` was forgettable, then `else: DoTheInevitableThing()` was just as good as just doing the inevitable thing. ;) I had *not* noticed the extra RETURN_VALUE command in there, actually! But is that an improvement or a liability? I assume that, because the RETURN_VALUE steps are strictly inaccessible, they do not actually add any 'weight' to the code, but even so, is it useful? I'm not saying that there's no merit in it, I just don't know (yet) what it is. :) — Augusta, May 10 '15 at 12:34
@Augusta: Yep. `else: DoTheInevitableThing()` is silly. :) And those extra `LOAD_CONST 0 (None); RETURN_VALUE` steps bloat the bytecode slightly, but they have virtually no impact on performance, since, as you say, they're inaccessible. Still, it's better to structure your functions so that there's always an unconditional return unless you want that automatic `return None` appended. BTW, you should check out the [SO Python Chat Room](http://chat.stackoverflow.com/rooms/6/python). — PM 2Ring, May 10 '15 at 14:37

Why is "if not (a and b)" faster than "if not a or not b"?

2 Answers2

TL;DR

Details

Linked