5

When checking for equality, is there any actual difference between speed and functionality of the following:

number = 'one'
if number == 'one' or number == 'two':

vs.

number = 'one'
if number in ['one', 'two']:
user2242044
  • 8,803
  • 25
  • 97
  • 164
  • 2
    Did you `timeit`? What happened? What if `number = 'two'`, or in a false-y case? Why not `in {'one', 'two'}`, if performance is important? – jonrsharpe Jun 15 '18 at 14:12
  • 2
    I think OP should timeit and answer himself. I'd up vote both posts then – Jay Calamari Jun 15 '18 at 14:16
  • Using a set is going to be faster than a list. So when you do a timing check with a set vs the if statement. – Grant Williams Jun 15 '18 at 14:25
  • 1
    @GrantWilliams: `set` only helps if the number of values to test is high enough; for two values, `set` won't help. And in this case, the OP is asking about Python 2, which doesn't optimize tests in `set` literals, so it would have to rebuild the `set` from scratch every time, even if the values are all literal constants, making it *guaranteed* slower unless you create the set as a global constant. On Python 3, there is an optimization to replace `set` literals of constants in this context with a constant `frozenset` so it can save you something there, but not in Py2. – ShadowRanger Jun 15 '18 at 14:28
  • @JayCalamari I ran timeit. Wasn't aware of it. Results are nearly identical on that. – user2242044 Jun 15 '18 at 14:51
  • @user2242044, awesome, thanks for the info. Interesting too – Jay Calamari Jun 15 '18 at 19:43

2 Answers2

7

If the values are literal constants (as in this case), in is likely to run faster, as the (extremely limited) optimizer converts it to a constant tuple which is loaded all at once, reducing the bytecode work performed to two cheap loads, and a single comparison operation/conditional jump, where chained ors involve two cheap loads and a comparison op/conditional jump for each test.

For two values, it might not help as much, but as the number of values increases, the byte code savings over the alternative (especially if hits are uncommon, or evenly distributed across the options) can be meaningful.

The above applies specifically to the CPython reference interpreter; other interpreters may have lower per-bytecode costs that reduce or eliminate the differences in performance.

A general advantage comes in if number is a more complicated expression; my_expensive_function() in (...) will obviously outperform my_expensive_function() == A or my_expensive_function() == B, since the former only computes the value once.

That said, if the values in the tuple aren't constant literals, especially if hits will be common on the earlier values, in will usually be more expensive (because it must create the sequence for testing every time, even if it ends up only testing the first value).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
-1

Talking about functionality - no, these two approaches generally differ: see https://stackoverflow.com/a/41957167/747744

Eugene Primako
  • 2,767
  • 9
  • 26
  • 35
  • The weirdness of `NaN` is worth noting, but it's not a "general difference" (`NaN` is nearly unique in its "not equal to itself" behavior). In terms of functionality, `number in ['one', 'two']` is equivalent to (aside from value load counts) `number is 'one' or number == 'one' or number is 'two' or number == 'two'`. For strings, that's purely a performance boost, not a behavior difference. – ShadowRanger Jun 15 '18 at 14:32
  • @ShadowRanger I'd agree if we were speaking about list of strings or other default types. But objects of user-defined classes can implement their own ```__eq__```, which can include nontrivial computations or concern similar objects being equal - there checking for ```is``` can alter behaviour. Just the same for custom containers overriding ```__contains__``` (but I would say it is a rarer case). – Eugene Primako Jun 15 '18 at 14:43