2

https://repl.it/@ArmanTavakoli/List-Comprehension-vs-Any

Why is my any check so much faster than my in check when they are essentially doing the same thing?

from timeit import default_timer as timer
import random

input = [random.randint(0, 100) for x in range(0, 1000000)]

def any_check(input):
  return any(i == 1 for i in input)

def list_comprehension(input):
  return 1 in [num for num in input]

first_start = timer()
any_check(input)
first_end = timer()
print('any_check', first_end - first_start)

second_start = timer()
list_comprehension(input)
second_end = timer()
print('list_comprehension', second_end - second_start)

Results of running the functions 3 times each.

# Calculated with 3 runs each
# Ratio is list_comprehension:any_check

# 10,000 - Mean Ratio: 17.87
# Example run;
# any_check 1.5022000297904015e-05
# list_comprehension 0.00038980199315119535

# 100,000 - Mean Ratio: 140.76
# any_check 2.020499960053712e-05
# list_comprehension 0.0035961729954578914

# 1,000,000 - Mean Ratio: 3379.81
# any_check 2.2904998331796378e-05
# list_comprehension 0.08528400499926647
ARMATAV
  • 604
  • 6
  • 26
  • 8
    Because `any` short-circuits and your implementation does not, always iterating through the entire thing, because you've pointlessly done `[num for num in input]`. – juanpa.arrivillaga Mar 12 '19 at 23:55
  • 1
    Note, both of these could be improved by removing the pointless iteration, `any(i for i in input)` is a convoluted way of doing `any(input)` and `1 not in [num for num in input]` is a convoluted way of doing `1 not in list(input)` which itself is an unnecessarily inefficient way of doing `1 not in input` – juanpa.arrivillaga Mar 12 '19 at 23:57
  • 1
    Essentially, if you catch yourself writing `x for x in whatever`, stop and think if you really need to do that. Even if it *is* useful, e.g. to copy it into a list, just do `list(whatever)` – juanpa.arrivillaga Mar 12 '19 at 23:58
  • 2
    Also note that your two functions do not do the same thing (it's unclear but sounds like you expect them to..?) – sleighty Mar 12 '19 at 23:59
  • 1
    @BrunoEly good point – juanpa.arrivillaga Mar 13 '19 at 00:00
  • @BrunoEly oops - i wanted them to check if True (1) is in the data, I removed the `not` – ARMATAV Mar 13 '19 at 00:02
  • @ARMATAV well, your `any` implementation is checking for any *non-zero* number, not just 1. But the other points i've made still stand. – juanpa.arrivillaga Mar 13 '19 at 00:02
  • @juanpa.arrivillaga Oh - so it should be `return any(i == 1 for i in input)` - right? – ARMATAV Mar 13 '19 at 00:04
  • 1
    @ARMATAV yeah. But really, the important thing is you don't need a list comprehension here at all. That is key. – juanpa.arrivillaga Mar 13 '19 at 00:05
  • @ARMATAV In Python [the only falsy number is 0](https://docs.python.org/2.4/lib/truth.html) so you'd want `list_comprehension` to `return True in [num for num in input]` or something like that. That was just an observation, though, and juanpa.arrivillaga's comment stands that the `any` short-circuits and that's the bulk of the time savings – sleighty Mar 13 '19 at 00:05
  • I see - so basically, if we've got a lot of data, and we reach some check - with `any` we stop and return the moment we find it, but the other way is needless iteration – ARMATAV Mar 13 '19 at 00:07
  • @ARMATAV using `in` would short-circuit as well, **but you are pointlessly iterating over the list** so even though `in` is in-fact short-circuiting, it doesn't matter – juanpa.arrivillaga Mar 13 '19 at 00:19
  • @juanpa.arrivillaga I gotcha - could just be `1 in input` instead of `1 in [pointless iteration with returned list]` – ARMATAV Mar 13 '19 at 00:20

1 Answers1

3

As several people pointed out in comments, the reason your function doing an in test is slower than the version using any is because that function also includes an unnecessary list comprehension that needs to iterate over the whole input before the in operator can begin looking for a match. When run on lists, both in and any can short circuit, quitting early if a matching value is found early in the search. But the list comprehension in your second function always iterates over the whole input even if there was a 1 right at the start.

If you replaced 1 in [num for num in input] with 1 in input, you'd see performance as good or better than in your function using any. Performance would be fairly similar if input was a list, but might be much faster for other container types (such as sets and ranges).

Blckknght
  • 100,903
  • 11
  • 120
  • 169