7

I want to calculate the sum of even numbers within a domain. I have two solutions, but I'm not sure of the advantages/disadvantages of each. Which is the optimal solution?

import sys
domain = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Cal1 = sum(filter(lambda n : n % 2 == 0, domain))
Cal2 = sum([n for n in domain if n % 2 == 0])
sys.stdout.write("Cal1 = {0}\n".format(Cal1))
sys.stdout.write("Cal2 = {0}\n".format(Cal2))
Michael Mrozek
  • 169,610
  • 28
  • 168
  • 175
kilobaik
  • 77
  • 1
  • 1
  • 6

4 Answers4

13

The second really should be just a generator, not a list comprehension (since you don't actually need to create a list to be able to sum the output of a generator):

Cal2 = sum(n for n in domain if n % 2 == 0)

It's the now-preferred ("pythonic") way for accomplishing this task.

  • Using a list comprehension (the one including the [], your original Cal2) is disadvantageous because it actually constructs a list object to return, which has overhead.

  • Using filter (your Cal1) is equivalent to a generator (the no-[] version), but requires a bit more typing and doesn't read quite as well as just using a generator expression (the code I posted above).

Amber
  • 507,862
  • 82
  • 626
  • 550
  • thanx dear ^_^ So, what is the use of the Lambda ? .. How can I use in my programs ? examples plz .. – kilobaik Jun 19 '10 at 18:30
  • `lambda` is used for when you need to pass a short function to something, and that function is just a basic transformation on inputs. For instance, some of the `sort` functions in the Python library allow you to pass a function to them to compare two items, if you want to sort a list in a non-standard way perhaps - then you could use something like `lambda x,y: `. Also, you're welcome. :) – Amber Jun 19 '10 at 18:31
  • @Karamela: Example: `reduce(lambda acc, (i, x): acc ^ i ^ x, enumerate(arr), 0)` http://stackoverflow.com/questions/2605766/how-to-find-a-duplicate-element-in-an-array-of-shuffled-consecutive-integers – jfs Jun 19 '10 at 18:36
7

Here are the speeds of the various versions on an old-ish Mac laptop:

$ py26 -mtimeit -s'domain = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]' 'sum(filter(lambda n : n % 2 == 0, domain))'
100000 loops, best of 3: 4.41 usec per loop
$ py26 -mtimeit -s'domain = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]' 'sum([n for n in domain if n % 2 == 0])'
100000 loops, best of 3: 2.69 usec per loop
$ py26 -mtimeit -s'domain = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]' 'sum(n for n in domain if n % 2 == 0)'
100000 loops, best of 3: 2.86 usec per loop

Note that, while the genexp version is no doubt more cool, the listcomp is marginally faster (probably not by enough to worry about unless this code is in a tight inner loop you're striving to optimize the snot out of;-). As usual, the lambda-based version is substantially slower, as others have mentioned -- lambda is kind of a "poor relation" in Python:-(. ((Not that a defined function would perform noticeably better here, either))

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Now try it with a domain of 1000000 numbers and see what happens. The generator has setup overhead, but uses less memory and does less copying. Somewhere between 100k and 1M elements, the generator version becomes faster. – Andrew McGregor Jun 20 '10 at 01:52
  • @Andrew, depends on the amount of available (and addressable;-) RAM -- as long as everything fits comfortably in addressable and physically available RAM listcomp is faster. With a million numbers as you say (Q's domain multiplied by 100000), same old macbook as usual: genexp 178 msec, listcomp 154 msec -- why don't you measure instead of pontificating? Of course at **some** point physically available/addressable RAM will run out and listcomp will slow down badly and even fail while genexp still works -- but on any decent modern machines that takes **WAY** more than a mere million #s. – Alex Martelli Jun 20 '10 at 02:12
  • BTW, @Andrew, you're also totally wrong in saying that the listcomp does extra copying -- it just puts the computed items in distinct locations instead of the same location over and over -- no copying. Both listcomp` and genexp` are just about `O(N)` (net of some fixed overhead for each, but that totally amortizes of course;-), listcomp just has a marginally smaller multiplier so it runs about 10% faster (6% and 15% in these two experiments) -- "not by enough to worry about unless the code is in a tight inner loop", as I said in the A! – Alex Martelli Jun 20 '10 at 02:16
  • Um, the listcomp does use more memory, for all those pointers, and ends up copying them around. And I did measure before posting. 1M entries, listcomp 176 ms, generator 159 ms, on my MacBook Pro. Reason is probably that the generator keeps everything in cache, while the listcomp's intermediate list doesn't fit in the 4MB processor cache on this CPU. That's a very big cache, so machines with smaller caches are going to hurt more. – Andrew McGregor Jun 20 '10 at 04:15
  • @Andrew McGregor: genexpr is slightly faster for 1M entries on Python 2.6.5 on my machine than listcomp (`128` vs `120` ms). Replacing `n % 2` by `n & 1` gives `104` vs. `93` ms. On Python3 listcomp is slightly faster than genexpr. On ideone listcomp is always faster than genexpr http://ideone.com/DNGpm http://ideone.com/8vybt – jfs Jun 20 '10 at 13:38
  • @Andrew, weird that we're measuring just about reversed times on similar machines -- I have no explanation for that. But the block memory copies of pointer blocks (amortized to overall `O(N)` via the usual overallocation strategy) are ultimately performed (in CPython) by the underlying system realloc, which should be optimized like blazes (the memory being copied from is soon going to be released to lots of the "copies" could trivially be done by simple page-table entry swaps, for example). – Alex Martelli Jun 20 '10 at 19:35
2

Your second way of doing it is what is called a list comprehension. List comprehensions can be used to achieve the things you would previously use filter and map for prior to their introduction to the language. See this previous question for a discussion on list comprehensions vs map which is similar to what you are asking.

As Amber writes, the recommended Pythonic way to do it is to use a generator. With the list comprehsions your entire filtered list is built and then summed. With the generator it is summed as it goes along without ever having the full list in memory. This makes more of a difference when you are working with more than 10 items.

Community
  • 1
  • 1
mikej
  • 65,295
  • 17
  • 152
  • 131
2

+1 to the other great answers.

Bonus: the generator expression is faster...

$ python -m timeit -s 'L = xrange(10)' 'sum(filter(lambda n: n % 2 == 0, L))'
100000 loops, best of 3: 3.59 usec per loop

$ python -m timeit -s 'L = xrange(10)' 'sum(n for n in L if n % 2 == 0)'
100000 loops, best of 3: 2.82 usec per loop

Docs re timeit.

mechanical_meat
  • 163,903
  • 24
  • 228
  • 223