5

In my code I used to have comparisons like if a == b or a == c or a == d: fairly frequently. At some point I discovered that these could easily be shortened to if a in {b, c, d}: or if a in (b, c, d): if the values aren't hashable. However, I have never seen such a construction in anyone else's code. This is probably because either:

  1. The == way is slower.
  2. The == way is more pythonic.
  3. They actually do subtly different things.
  4. I have, by chance, not looked at any code which required either.
  5. I have seen it and just ignored or forgotten it.
  6. One shouldn't need to have comparisons like this because one's code sould be better elsewhere.
  7. Nobody has thought of the in way except me.

Which reason, if any, is it?

LeopardShark
  • 3,820
  • 2
  • 19
  • 33

3 Answers3

5

For simple values (i.e. not expressions or NaNs), if a == b or a == c and if a in <iterable of b and c> are equivalent.

If the values are hashable, it's better to use in with a set literal instead of tuple or list literals:

if a in {b, c}: ...

CPython's peephole optimiser is often able to replace that with a cached frozenset() object, and membership tests against sets are O(1) operations.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • 5
    Nitpick: they're not quite equivalent, though in many cases that occur in practice they're likely to be interchangeable. Consider the case where `a = b = c = float('nan')`: `a == b or a == c` is `False`, while `a in {b, c}` is `True`. – Mark Dickinson Aug 21 '17 at 15:11
  • @MarkDickinson Why is the first one `False`? – Ma0 Aug 21 '17 at 15:12
  • 1
    @Ev.Kounis because of the standard IEE754, NaN is the only element which is not equal to itself, it is exactly how you should test it! – jlandercy Aug 21 '17 at 15:15
  • @Ev.Kounis `a = b = c = float('nan') ; print(a == b) ; False` – DeepSpace Aug 21 '17 at 15:15
  • @jlandercy This probably creates more problems than it is trying to solve though.. – Ma0 Aug 21 '17 at 15:15
  • 1
    @Ev.Kounis I dont think so, NaN means not a number, it is the same problem than NULL in Database. How could you compare something that you do not know. Therefore it does not compare, but it is handy to have a truth value, so it compare to False instead of raising an error – jlandercy Aug 21 '17 at 15:17
2

Performancewise : "in" is better

timeit.timeit("pub='1'; pub == 1 or pub == '1'")
0.07568907737731934
timeit.timeit("pub='1'; pub in[1, '1']")
0.04272890090942383
timeit.timeit("pub=1; pub == 1 or pub == '1'")
0.07502007484436035
timeit.timeit("pub=1; pub in[1, '1']")
0.07035684585571289

Also "in" ensures code is not repeating a == 1 or a == 2 is repetition. And bad to read. "in" just makes it much more easy to understand. This is one of the cases which is simple yet elegant code practice. In short we(should) use "in" more often if we are not already using it.

Surjit R
  • 315
  • 1
  • 5
2

I was curious to know what the timing difference was between straight comparison vs checking in the array.

Conclusion: The cost of constructing the array is not free and must be taken into account when considering the speed differences.

If the array is being constructed at the time of comparison, it is technically slower than the simple comparison. So the simple comparison would be faster in or out of a loop.

That said if the array is already constructed then it would be faster to check in the array in a large loop than doing a simple comparison.

$ speed.py
inarray                   x 1000000:  0.277590343844
comparison                x 1000000:  0.347808290754
makearray                 x 1000000:  0.408771123295
import timeit

NUM = 1000000

a = 1
b = 2
c = 3
d = 1

array = {b,c,d}
tup = (b,c,d)
lst = [b,c,d]

def comparison():
    if a == b or a == c or a == d:
        pass

def makearray():
    if a in {b, c, d}:
        pass

def inarray():
    if a in array:
        pass

def maketuple():
    if a in (b,c,d):
        pass

def intuple():
    if a in tup:
        pass

def makelist():
    if a in [b,c,d]:
        pass

def inlist():
    if a in lst:
        pass


def time_all(funcs, params=None):
    timers = []
    for func in funcs:
        if params:
            tx = timeit.Timer(lambda: func(*params))
        else:
            tx = timeit.Timer(lambda: func())
        timers.append([func, tx.timeit(NUM)])

    for func, speed in sorted(timers, key=lambda x: x[1]):
        print "{fn:<25} x {n}: ".format(fn=func.func_name, n=NUM), speed
    print ""
    return

time_all([comparison,
          makearray,
          inarray,
          intuple,
          maketuple,
          inlist,
          makelist
          ], 
         )

This doesn't quite answer your question as to the reason why you don't often see the comparison using in. I would be speculating but it's likely a mix of 1,2,4, and the situation where the author needed to write that particular bit of code.

I've personally used both methods depending on the situation. The choice usually came down to speed or simplicity.


edit:

@bracco23 is right, there are slight differences whereby using tuples vs array vs list will change the timing.

$ speed.py
inarray                   x 1000000:  0.260784980761
intuple                   x 1000000:  0.288696420718
inlist                    x 1000000:  0.311479982167
maketuple                 x 1000000:  0.356532747578
comparison                x 1000000:  0.360010093964
makearray                 x 1000000:  0.41094386108
makelist                  x 1000000:  0.433603059099
Marcel Wilson
  • 3,842
  • 1
  • 26
  • 55
  • I added some tests using tuples instead of arrays. When using an already existent tuple, it performs better than the comparison but worse than an already existent array. When we make the tuple on the fly, it performs better than making the array and more or less like the comparison ( they switched place a couple of times in the ordered list) – bracco23 Aug 21 '17 at 15:46
  • Good call. I've edited the response to reflect those differences. tuples, list, arrays all have slightly different timings. – Marcel Wilson Aug 21 '17 at 15:58
  • Out of curiosity I modified your test to use a list created with `list(range(1,1000)) + [0]` and `a = 0` (so that `a` is found at the end of the lengthy list); of course, I run it only for (pre-made) list, tuple, set, and frozenset, skipping the `or`-ed comparison. The results were quite different: frozenset 0.21, set 0.24, tuple 6.72, list 7.15. Apparently Python is doing a linear search in tuple or list, rather than converting it implicitly to a set. For Python3, the difference was even greater (~0.19 vs ~9.8), – Błotosmętek Aug 21 '17 at 19:28
  • @Błotosmętek I'm seeing slightly different numbers. set 0.26, frozenset 0.27, tuple 14.49, list 14.72. Timing differences can be accounted for by CPU but what I don't quite understand is the difference in the order for set vs frozenset between us. I would think we should be seeing the equivalent timing ratios. – Marcel Wilson Aug 21 '17 at 20:38