7

For example:

a = [1,2,3]
x = [2*i for i in a]
y = [3*i for i in a]

Would it be more efficient to combine the list comprehensions into one (if possible) if the size of a is large? If so, how do you do this?

Something like,

x,y = [2*i, 3*i for i in a]

which doesn't work. If using list comprehension is not more computationally efficient than using a normal for loop, let me know too. Thanks.

user1473483
  • 305
  • 1
  • 3
  • 7
  • The latter list comprehension returns a list of tuples, which you then try to assign to a single tuple, which is why this fails. You could flatten this into two separate lists afterwards, but I can't say which is more efficient. – Edd Barrett Jun 21 '12 at 22:47
  • 1
    you can think of converting a list of pairs to a pair of lists as [transposing a 2D matrix](http://stackoverflow.com/q/4937491/4279). – jfs Jun 21 '12 at 22:52
  • @user1473483 what about `x,y=[[2*i for i in a],[3*i for i in a]]` ?? – Ashwini Chaudhary Jun 21 '12 at 23:01
  • 1
    Efficiency doesn't matter much here (since looping is cheap), but your question would make sense if `a` were a generator that should be only consumed once. – georg Jun 22 '12 at 08:38
  • or 'xy = [(2*i, 3*i) for i in a]' – ctrl-alt-delor Jun 22 '12 at 08:55

2 Answers2

13

You want to use the zip() builtin with the star operator to do this. zip() normally turns to lists into a list of pairs, when used like this, it unzips - taking a list of pairs and splitting into two lists.

>>> a = [1, 2, 3]
>>> x, y = zip(*[(2*i, 3*i) for i in a])
>>> x
(2, 4, 6)
>>> y
(3, 6, 9)

Note that I'm not sure this is really any more efficient in a way that is going to matter.

Gareth Latty
  • 86,389
  • 17
  • 178
  • 183
  • You could use a generator instead of list comprehension here. `x, y = zip(*((2*i, 3*i) for i in a))` (if it makes any difference) – jadkik94 Jun 21 '12 at 22:55
  • @jadkik94 As you are going to be using all the values anyway with zip, it's pretty much going to make no difference, although what you say is true. – Gareth Latty Jun 21 '12 at 23:06
  • @jadkik94 that seems to be the slowest option, check my answer. – Trufa Jun 21 '12 at 23:24
  • 1
    @Lattyware I was actually surprised it even works, unpacking a generator... so I thought I'd share it :) turns out it's slower :( – jadkik94 Jun 21 '12 at 23:41
  • @jadkik94: I believe the main advantage a generator has over building a temporary container is that it requires less memory. Sounds like it's the perennial memory vs speed trade-off... – martineau Jun 22 '12 at 02:04
  • It's just a matter of the process of getting an item from the generator (taking the item, then freezing the process, carrying on) is a lot more work than iterating through a list (add one to the index we are on), so a list will be faster, provided preallocating isn't very slow. In this case, the unpacking will take all the values anyway, so a list comp is just as good. Of course, for any non-trivial generator, that extra cost in processing becomes completely irrelevant. – Gareth Latty Jun 22 '12 at 13:14
10

When in doubt about efficiency use the timeit module, it's always easy to use:

import timeit

def f1(aRange):
    x = [2*i for i in aRange]
    y = [3*i for i in aRange]
    return x,y

def f2(aRange):
    x, y = zip(*[(2*i, 3*i) for i in aRange])
    return x,y

def f3(aRange):
    x, y = zip(*((2*i, 3*i) for i in aRange))
    return x,y

def f4(aRange):
    x = []
    y = []
    for i in aRange:
        x.append(i*2)
        y.append(i*3)
    return x,y

print "f1: %f" %timeit.Timer("f1(range(100))", "from __main__ import f1").timeit(100000)
print "f2: %f" %timeit.Timer("f2(range(100))", "from __main__ import f2").timeit(100000)
print "f3: %f" %timeit.Timer("f3(range(100))", "from __main__ import f3").timeit(100000)
print "f4: %f" %timeit.Timer("f4(range(100))", "from __main__ import f4").timeit(100000)

The results seem to be consistent in pointing to the first option as the quickest.

f1: 2.127573
f2: 3.551838
f3: 3.859768
f4: 4.282406
Trufa
  • 39,971
  • 43
  • 126
  • 190
  • 1
    This is a very good point. I would argue that the first option is probably the clearest as well as the fastest. The only time where this would be preferred would be if `aRange` were a generator that became exhausted, where using this method would avoid having to use `tee()` or a temporary list. – Gareth Latty Jun 22 '12 at 13:17