3

I experimented a little bit with list comprehensions and the filter() function today, because I was interested to see if there are significant efficiency improvements if using one over the other. The results are a little bit confusing. When I filtered for even numbers, list comprehensions outperformed the traditional nested structure and the filter() function by ~1.5x (i.e., it was ~1.5x faster).
But when I was using a function to check if a number was a prime number or not, the filter() function was suddenly the fastest.

I posted more details below, and I uploaded the code at github if you want to try it out yourself: https://github.com/rasbt/list_comprehension_test

I tested the code with different range maximum values n multiple times to make sure that the results are consistent and not affected by some temporary background process on my machine.

My questions:

  • Any idea why filter function is so slow when filtering for even numbers? Could it be, because of the lambda function or because I am converting the generator object into a list?
  • why are the results for the is_prime function so similar, and why is the filter function the fastest here?

1st Part: collecting even numbers


a) loop and else-if

even_nums = []
for i in range(1, n):
    if i % 2 == 0:
        even_nums.append(i)

b) list comprehension:

even = [i for i in range(1, n) if i % 2 == 0]

c) filter() function

even_nums = list(filter((lambda x: x%2 != 0), range(1, n)))

results for is_even

  • loop and else-if: 1x (reference)
  • list comprehension: 1.5x faster
  • filter() function: 0.9x faster

2nd Part: Collecting Prime Numbers


def is_prime(num):
    """ Returns True if input integer is a prime number. """
    prime = True
    if num < 2:
        prime = False

    elif num == 2:
        prime = True
    else:
        for i in range(2, num):
            if num % i == 0:
                prime = False    
                break
    return prime

a) loop and else-if

primes = []
for i in range(1, n):
    if is_prime(i):
        primes.append(i)

b) list comprehension:

primes = [i for i in range(1, n) if is_prime(i)]

c) filter() function

primes = list(filter(is_prime, range(1, n)))

results for is_prime

  • loop and else-if: 1x (reference)
  • list comprehension: 0.98x faster
  • filter() function: 1.13x faster
  • 2
    I guess it depends what dominates, checking the value or appending to the existing list. In Python 2 you could see if [`itertools.ifilter`](http://docs.python.org/2/library/itertools.html#itertools.ifilter) makes any difference. – jonrsharpe Jan 13 '14 at 00:13
  • Thanks, that's a good point –  Jan 13 '14 at 00:14
  • possible duplicate of [List filtering: list comprehension vs. lambda + filter](http://stackoverflow.com/questions/3013449/list-filtering-list-comprehension-vs-lambda-filter) – jonrsharpe Jan 13 '14 at 00:19
  • Have you tried doing performance analysis? Docs for profiling are [here](http://docs.python.org/2/library/profile.html) and there is an existing SO question [here](http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script) – SleepyCal Jan 13 '14 at 00:19
  • Haven't heard of it, yet. But looks very very useful to me, I will use it to do a more comprehensive analysis/check in the next couple of days. –  Jan 13 '14 at 00:27

1 Answers1

1

If you implement the first test this way, the results should be consistent with the second test:

is_even = lambda i: i % 2 == 0
even = [i for i in range(1, n) if is_even(i)]

In the filter implementation, there is a function call once per iteration (the lambda), which is an extra step. This difference is not present in the second test because in that case both implementations already consist of a call (is_prime) once per iteration.

As for why filter is slightly faster, I suspect it's related to filter being native rather than python code. Consider that the list comprehension still has an additional evaluation of python code per iteration: namely, the i before the for. This evaluation step would not be necessary in filter which could directly yield the value in the native implementation.

nmclean
  • 7,564
  • 2
  • 28
  • 37