Why is any (True for ... if cond) much faster than any (cond for ...)?

Question

Two similar ways to check whether a list contains an odd number:

any(x % 2 for x in a)
any(True for x in a if x % 2)

Timing results with a = [0] * 10000000 (five attempts each, times in seconds):

0.60  0.60  0.60  0.61  0.63  any(x % 2 for x in a)
0.36  0.36  0.36  0.37  0.37  any(True for x in a if x % 2)

Why is the second way almost twice as fast?

My testing code:

from timeit import repeat

setup = 'a = [0] * 10000000'

expressions = [
    'any(x % 2 for x in a)',
    'any(True for x in a if x % 2)',
]

for expression in expressions:
    times = sorted(repeat(expression, setup, number=1))
    print(*('%.2f ' % t for t in times), expression)

Try it online!

Another way to look at the filtering approach is `next((True for x in a if x % 2), False)`, which uses an explicit default to make it clear that the filtering generator doesn't yield any values that are false, even if `a` only contains even numbers. — kojiro, Aug 29 '21 at 16:42

score 92 · Accepted Answer · edited Sep 06 '21 at 05:17

92

The first method sends everything to any() whilst the second only sends to any() when there's an odd number, so any() has fewer elements to go through.

edited Sep 06 '21 at 05:17

David Buck

3,752
35
31
35

answered Aug 26 '21 at 12:45

Adrien Levert

964
6
15

10

The other answers have more details, but I think the brevity and clarity of this one makes it a nice top answer to read first. – no comment Aug 26 '21 at 13:22
13

It should be noted that since it's is a generator, "everything" doesn't necessarily mean that it processes all the elements of `a`. Both versions stop when they get to the first odd number. But in the test case, there are no odd numbers, so the generator does have to go through the whole list. – Barmar Aug 27 '21 at 13:29

score 79 · Answer 2 · edited Aug 27 '21 at 20:26

79

(x % 2 for x in a)

This generator produces a series of falsey values until it produces a truthy value (if it does), at which point any will stop iterating the generator and return True.

(True for x in a if x % 2)

This generator will only produce exactly one True value (if it does), at which point any will stop the iteration and return True.

The additional back and forth of yielding back to any and then fetching the next value from the generator in the first case accounts for the overhead.

edited Aug 27 '21 at 20:26

Kelly Bundy

23,480
7
29
65

answered Aug 26 '21 at 12:46

deceze

510,633
85
743
889

1

In the given example all the numbers are even, so no `True` values are produced. – bereal Aug 26 '21 at 12:47
2

So the second way is always faster or equally fast? Is it preferable in general then? So far I've been using the first way. – no comment Aug 26 '21 at 13:02
12

@don't That looks like a micro-optimisation to me, so you should only go for that if it actually makes a difference to you because it measurably speeds up your algorithm in actual practice. I wouldn't default to this style just because… – deceze Aug 26 '21 at 13:08
@deceze I have to agree, this is a serious case of micro-optimisation. We're talking about roughly 30ns extra time for running `any()` on a list of 10M elements. This question and the answers remain very interesting, but I'm going to stick to the more readable version, even if it's ever so slightly slower. With the faster version, I'd feel the need to write a comment saying why I'm using such a surprising line of code. – joanis Oct 05 '21 at 17:37

chepner · Answer 3 · 2021-08-26T13:27:51.080

TL;DR The slow version has to iterate over a long sequence of false values before returning False. The fast version "iterates" over an empty sequence before doing the same. The difference is the time it takes to construct the long-false sequence vs the empty sequence.

Let's look at the byte code generate by each. I've omitted the first section for each, as they are identical for the both. It's only the code for the generators involved that we need to look at.

In [5]: dis.dis('any(x%2 for x in a)')
[...]

Disassembly of <code object <genexpr> at 0x105e860e0, file "<dis>", line 1>:
  1           0 LOAD_FAST                0 (.0)
        >>    2 FOR_ITER                14 (to 18)
              4 STORE_FAST               1 (x)
              6 LOAD_FAST                1 (x)
              8 LOAD_CONST               0 (2)
             10 BINARY_MODULO
             12 YIELD_VALUE
             14 POP_TOP
             16 JUMP_ABSOLUTE            2
        >>   18 LOAD_CONST               1 (None)
             20 RETURN_VALUE


In [6]: dis.dis('any(True for x in a if x % 2)')
[...]

Disassembly of <code object <genexpr> at 0x105d993a0, file "<dis>", line 1>:
  1           0 LOAD_FAST                0 (.0)
        >>    2 FOR_ITER                18 (to 22)
              4 STORE_FAST               1 (x)
              6 LOAD_FAST                1 (x)
              8 LOAD_CONST               0 (2)
             10 BINARY_MODULO
             12 POP_JUMP_IF_FALSE        2
             14 LOAD_CONST               1 (True)
             16 YIELD_VALUE
             18 POP_TOP
             20 JUMP_ABSOLUTE            2
        >>   22 LOAD_CONST               2 (None)
             24 RETURN_VALUE

Both are identical up to the BINARY_MODULO instruction. After that, the slower version has to yield the resulting value for any to consume before moving on, while the second code immediately moves on to the next value. So basically, the slower code has to consume a long list of false (i.e., non-zero) values to determine that there are no true values. The faster code only needs to consume an empty list.

score 25 · Answer 4 · edited Sep 17 '21 at 09:59

The previous answers somewhat assume the reader is already familiar with the syntax and generators. I'd like to explain more for people who aren't.

The snippet

any(x % 2 for x in a)

is short syntax for:

any((x % 2 for x in a))

So what's happening is that (x % 2 for x in a) gets evaluated and the result value is then given to the any function. Just like print(21 * 2) computes the value 42, which is then given to the print function.

The expression (x % 2 for x in a) is a generator expression and its result is a generator iterator. That is an object that computes and hands out its values on demand. So in this case, when asked for a value, this iterator looks at the next value from a, computes its remainder modulo 2 (i.e., 0 for even and 1 for odd), and hands out that value. And then literally waits for possibly getting asked again for another value.

The any function is a second actor here. It gets the iterator as its argument, and then asks the iterator for more and more values, hoping for one that's true (note that 1 is true and 0 is false).

You can really think of it as two different persons interacting. The any-guy asking the iterator-guy for values. Again, note that the iterator-guy does not compute all values in advance. Only one at a time, whenever the any-guy asks for the next value. So it's really a back-and-forth between the two guys.

In the case of any(x % 2 for x in a), the iterator-guy, whenever asked by the any-guy for the next value, just computes one modulo value, hands it to the any-guy, and the any-guy has to judge it. Here the iterator-guy is like an incompetent junior developer, involving the manager for every single number, somewhat forcing them to hardcore micro-manage.

In the case of any(True for x in a if x % 2), the iterator-guy, whenever asked by the any-guy for the next value, doesn't mindlessly hand over just the modulo values. Instead, this iterator-guy judges the values himself, and only hands over something to the manager when there's something worthy to hand over. Namely only when he discovers an odd value (in which case he doesn't hand over 0 or 1, but True). Here the iterator-guy is like a competent senior developer doing all the work, and the manager can totally lay back and chill (and at the end of the day still take all the credit).

It should be clear that the second way is much more efficient, as they don't needlessly communicate for every ... single ... input number. Especially since your input a = [0] * 10000000 doesn't contain any odd numbers. The junior developer reports ten million zeros to the manager who has to judge all of them. With a constant back-and-forth between them for every zero. The senior developer judges all himself and reports nothing to his manager. Well, ok, both developers at the end additionally report that they're done, at which point the manager concludes False as the result of the whole any(...) expression).

I have read all three answers. This one was the most confusing one and I still have no idea what accounts for such discrepancy in performance between the two expressions in question. Isn't `True for x in a if x % 2` exactly what `any(x % 2 for x in a)` does? — oguz ismail, Aug 27 '21 at 19:27
@oguzismail `True for x in a if x % 2` isn't even a valid expression, can you clarify what you mean? — Kelly Bundy, Aug 27 '21 at 19:41
I mean, `True for x in a if x % 2` means *return true if `x % 2` is true for any element in `a`*, right? Isn't it what `any(x % 2 for x in a)` does? What does the latter do in between to achieve the same thing almost twice as slower? Without using workplace analogies and this-guy and that-guy terminology, please — oguz ismail, Aug 27 '21 at 19:53
@oguzismail No, `True for x in a if x % 2` literally means *nothing* in Python. That's [not valid syntax](https://tio.run/##K6gsycjPM7YoKPr/P6SoNFUhLb9IoUIhM08hUSEzDchSVTD6/x8A). Can you fix it and tell what you really mean? — Kelly Bundy, Aug 27 '21 at 20:01
Enclosed in `any(...)`, it means what I said. I have already told you what I *really mean*, and you're refusing to understand. Hence the downvote. Have a nice day. — oguz ismail, Aug 27 '21 at 20:12
@oguzismail I didn't want to know what "meaning" of the code you meant, I wanted to know what *code* you meant. Because I can't discuss code that I don't know. So did you mean `any(True for x in a if x % 2)`? And yes, both `any(x % 2 for x in a)` and `any(True for x in a if x % 2)` have the same meaning, pretty much the "return true if x % 2 is true for any element in a" that you said. But they achieve that in *different ways*, and one of them is faster. Just like `min(a)` and `sorted(a)[0]` mean the same thing ("minimum of `a`") but compute it in different ways that are differently fast. — Kelly Bundy, Aug 27 '21 at 20:21
@oguzismail The different ways that I described in the answer. Your *"first fetch the result of x % 2 and test whether it is True"* is unclear, as you're not saying which objects do those things. I see you're a bash guy, so let me try that. Compare `cat numbers.txt | print_modulo_2 | are_any_true` with `cat numbers.txt | print_true_only_if_odd | are_any_true`. The second way is faster because the lines with even numbers are discarded earlier, aren't sent through the pipe to `are_any_true`. Same in the Python code, where the generators play the `print_...` roles. . Is it clear now? — Kelly Bundy, Aug 27 '21 at 21:27
Yes, it is. You're saying the same thing as the top voted answer using more words. Thanks — oguz ismail, Aug 28 '21 at 04:13

score 7 · Answer 5 · answered Aug 30 '21 at 22:11

7

Number of "checking for falsiness" is not the actual reason because in faster version we can see an if statement which intern calls bool(). That checking is done "in advance" in faster case. So in both cases python has to go through all values and check truthiness of all of them.

The procedure that showed in Chepner's answer is indeed the answer of the question. Let's find when the next item in for loop can be requested...:

In faster case, it is just after the BINARY_MODULO, but in POP_JUMP_IF_FALSE statement it has to do a little bit of work to check the truthiness(if calls bool()) while in slower version it doesn't check that there. Up until now (-1) point for faster version. BUT in slower version it has to do three steps to reach the point to request for next item, YIELD_VALUE, POP_TOP, JUMP_ABSOLUTE. So (-3) for slower version... Those three steps causes the overhead because they can not be skipped.

In other words, faster version only does "checking" to reach the point to request for next item but slower version has to do "checking + those steps". Again both of them check for truthiness of all values.

The answer is the overhead of yielding.

answered Aug 30 '21 at 22:11

S.B

13,077
10
22
49

Little nitpick: `if` doesn't call `bool()`. – no comment Aug 30 '21 at 22:23
@don'ttalkjustcode It does, check [this link](https://tio.run/##PY07DsMgEET7PcV0QJ0msuTCd0iPbAIO0QrQgoucnmBZ8uvmo5nya5@cHs8ivTtea8UyEQZvH2DtljNbq6vnYC7/pEhMTaszRKxwK3NMuzJ3QXw7JOElhyfK2xczFm2IYsBQ11AZZ73/AQ) please. – S.B Aug 30 '21 at 22:27
No, that's the object's `__bool__` method. The claim `'bool is calling'` is incorrect. Something else calls it. – no comment Aug 30 '21 at 22:34
@don'ttalkjustcode `if obj:` is implicitly changed to `if bool(obj) is True:` or as you said python calls `__bool__` of that object. I just intended to say the check for truthiness is done in "if statement" just like in `any()`. in both cases this `__bool__` gets called. – S.B Aug 30 '21 at 22:38
No, you can see [`POP_JUMP_IF_FALSE` directly uses `PyObject_IsTrue`](https://github.com/python/cpython/blob/5246dbc2a12bf8e64e18efee2fdce02a350bbf09/Python/ceval.c#L3962-L3976), bypassing `bool()`. – no comment Aug 30 '21 at 22:42
@don'ttalkjustcode good point Interesting, So why in every cases that `__bool__` gets called? I mean in `Any()` in `if` statement... `[i for i in range(10) if A()]` prints that sentence 10 times, I thought this is what `bool()` does , bool() invokes `__bool__`. – S.B Aug 30 '21 at 22:59
`bool()` does (or rather might) invoke `__bool__`, it's just not the only thing that does. – no comment Aug 30 '21 at 23:05
2

@don'ttalkjustcode Since it's going to be way off topic, for last question, I said the `if obj:` statement does exactly what `if bool(obj) is True:` does, but you reject it and you said Something else calls `__bool__`. Could you tell how does this work? `if obj:` is also check for returning boolean value from `obj`'s `__bool__` method. – S.B Aug 30 '21 at 23:22
You can disassemble that as well, should also use [`PyObject_IsTrue`](https://github.com/python/cpython/blob/044e8d866fdde3804bdb2282c7d23a8074de8f6f/Objects/object.c#L1433). Which then calls the object's `__bool__` method if none of the earlier options there apply. At least I think so. None of the options there sound like they apply. – no comment Aug 30 '21 at 23:30
2

I think it's gotta be the `tp_as_number->nb_bool` option, maybe the `as_number` is just a misleading name. – no comment Aug 30 '21 at 23:38

Why is any (True for ... if cond) much faster than any (cond for ...)?

5 Answers5

Linked