5

I have something, when run as a list comprehension, runs fine.

It looks like,

[myClass().Function(things) for things in biggerThing]

Function is a method, and it builds a list. The method itself doesn't return anything, but lists get manipulated within.

Now when I change it to a generator ,

(myClass().Function(things) for things in biggerThing)

It doesn't manipulate the data like I would expect it to. In fact, it doesn't seem to manipulate it at all.

What is the functional difference between a list comprehension and a generator?

myacobucci
  • 187
  • 2
  • 12
  • 1
    http://stackoverflow.com/questions/47789/generator-expressions-vs-list-comprehension – karthikr Nov 12 '13 at 15:54
  • 8
    Don't use a list comprehension for the side effects. Now you are building a list of `None` values and discarding it again, wasting CPU and memory. – Martijn Pieters Nov 12 '13 at 15:54
  • 4
    Why are you using a list comprehension if you don't want to build a list? Usually people learn about `for` loops before they learn about list comprehensions, but maybe you went the other way? – DSM Nov 12 '13 at 15:54

5 Answers5

5

Generators are evaluated on the fly, as they are consumed. So if you never iterate over a generator, its elements are never evaluated.

So, if you did:

for _ in (myClass().Function(things) for things in biggerThing):
    pass

Function would run.


Now, your intent really isn't clear here.

Instead, consider using map:

map(myClass().Function, biggerThing)  

Note that this will always use the same instance of MyClass

If that's a problem, then do:

for things in BiggerThing:
    myClass().Function(things)
Thomas Orozco
  • 53,284
  • 11
  • 113
  • 116
  • 2
    I don't think `map` would be a better solution than a generator or a list comprehension. If you don't care about the return value of `myClass().Function(things)`, there is no reason to waste memory by storing the results. – Blender Nov 12 '13 at 15:59
  • The regular for loop is how I had it structure before. And I was just wondering if by using a generator if it might speed up the program execution. – myacobucci Nov 12 '13 at 15:59
  • @Blender I was essentially pointing out that the intent would be clearer using map. Now, if performance is a concern, a for loop is probably the most appropriate. – Thomas Orozco Nov 12 '13 at 16:09
3

Generators are lazy evaluated. You need to process a generator in order to your function be evaluated. One can use collections.deque to consume a generator:

import collections
generator = (myClass().Function(thing) for thing in biggerThing) 
collections.deque(generator , maxlen=0)

And consider using @staticmethod or @classmethod, or change to

myfunc = myClass().Function
generator = (myfunc(thing) for thing in biggerThing) 
collections.deque(generator , maxlen=0)

to reduce new instance of myClass creation for each thing processing.

update, performance

  1. collections vs iteration
def l():
    for x in range(100):
       y = x**2
      yield y

def consume(it):
    for i in it:
        pass

>>> timeit.timeit('from __main__ import l, consume; consume(l())', number=10000)
0.4535369873046875
>>> timeit.timeit('from __main__ import l, collections; collections.deque(l(), 0)', number=10000)
0.24533605575561523
  1. instance vs class vs static methods
class Test(object):
    @staticmethod
    def stat_pow(x):
        return x**2
    @classmethod
    def class_pow(cls, x):
        return x**2
    def inst_pow(self, x):
        return x**2

def static_gen():
    for x in range(100):
        yield Test.stat_pow(x)

def class_gen():
    for x in range(100):
        yield Test.class_pow(x)

def inst_gen():
    for x in range(100):
        yield Test().inst_pow(x)

>>> timeit.timeit('from __main__ import static_gen as f, collections; collections.deque(f(), 0)', number=10000)
0.5983021259307861
>>> timeit.timeit('from __main__ import class_gen as f, collections; collections.deque(f(), 0)', number=10000)
0.6772890090942383
>>> timeit.timeit('from __main__ import inst_gen as f, collections; collections.deque(f(), 0)', number=10000)
0.8273470401763916
Community
  • 1
  • 1
alko
  • 46,136
  • 12
  • 94
  • 102
  • What are the performance benefits of using `collections.deque`? – myacobucci Nov 12 '13 at 16:10
  • `collections.deque` is probably the fastest way to consume a generator, but for this situation the overhead of instantiating your `myClass()` object to call one method on it and then throwing it way will almost certainly swamp any micro-optimisations. – Duncan Nov 13 '13 at 09:36
  • @Duncan If you look at test results on my machine (bottom of this answer), in a simple case benefits of deque approach prevail that of myClass instaniation. It might be otherwise for more complex classes and functions, you might make your own tests. – alko Nov 13 '13 at 09:50
  • I cannot reproduce your collect vs iteration times. Running your exact code I get 0.42 for `consume` and 0.43 for `dequeue`. I ran it several times and 'consume' is always slightly faster for me. Also, you aren't comparing like with like there. The comparison that should be made on the original code would be `collections.deque(generator , maxlen=0)` compared with just doing `for thing in biggerThing: myClass().Function(things)` Your comparison is forcing an extra for loop to consume a generator when the generator isn't needed at all. – Duncan Nov 13 '13 at 11:57
  • @Duncan do you run cPython or PyPy or else? machine arch (64 or 32) and os? – alko Nov 13 '13 at 17:22
  • 1
    That was Python 3.3.0, 32 bit on Windows 7. Just re-ran that and get 0.427 for `consume()` vs 0.431 for `dequeue()`. Running the identical code on Python 2.7.2 gave me 0.139 vs 0.143. Running on Pypy 1.7 is the only one where `deque` wins: 0.148 vs 0.105 on the first run, but if I repeat each `timeit` call I get 0.023 vs 0.030 for the second. Don't you just love a good JIT. – Duncan Nov 14 '13 at 09:12
2

When you create a generator, you are only able to use each element once. It's like I'm creating a batch of cookies that I'm eating as I go. They serve their purpose (make me happy), but they're gone once you use them.

List comprehensions create lists, and they will allow you to access that data structure forever (ostensibly). You can also use all the list methods on them (very useful). But the idea is that it creates an actual data structure (something that holds data for you).

Check out this post right here: Generators vs. List Comprehensions

Community
  • 1
  • 1
1

Generator won't execute the function, until you call next() on the generator.

 >>>def f():
 ...    print 'Hello'
 >>>l = [f() for _ in range(3)]
 Hello
 Hello
 Hello
 >>>g = (f() for _ in range(3)) # nothing happens 
 >>>
 >>>next(g)
 Hello
Alexander Zhukov
  • 4,357
  • 1
  • 20
  • 31
0

List comprehension:

  • List can be indexed. eg.,. [0, 1, 2, 3, 4][0]

  • A created List can be used any number of times.

  • An empty list occupies 72 bytes, and for each item adds occupies 8 bytes extra.

Generators:

  • Generators cant be indexed

  • A generator can be used only once.

  • A generator occupies much lesser memory(80 bytes).

Please note that in case of generator, the content inside is emptied, once it is used.

>>> sys.getsizeof([])
72
>>> list1 = [x for x in range(0, 5)]
>>> sys.getsizeof(list1)
136
>>>
>>> generator1 = (x for x in range(0,100))
>>> sys.getsizeof(generator1)
80
>>> generator1 = (x for x in range(0,5))
>>> sys.getsizeof(generator1)
80
>>> list(generator1)
[0, 1, 2, 3, 4]
>>> list(generator1)
[]
>>> 
SuperNova
  • 25,512
  • 7
  • 93
  • 64