175

Python generators are very useful. They have advantages over functions that return lists. However, you could len(list_returning_function()). Is there a way to len(generator_function())?

UPDATE:
Of course len(list(generator_function())) would work.....
I'm trying to use a generator I've created inside a new generator I'm creating. As part of the calculation in the new generator it needs to know the length of the old one. However I would like to keep both of them together with the same properties as a generator, specifically - not maintain the entire list in memory as it may be very long.

UPDATE 2:
Assume the generator knows it's target length even from the first step. Also, there's no reason to maintain the len() syntax. Example - if functions in Python are objects, couldn't I assign the length to a variable of this object that would be accessible to the new generator?

serv-inc
  • 35,772
  • 9
  • 166
  • 188
Jonathan Livni
  • 101,334
  • 104
  • 266
  • 359
  • 3
    You mean avoiding the obvious `len(list(generator_function()))` ? – 6502 Sep 18 '11 at 10:18
  • 3
    If you *really* need the length, generators are the wrong approach. But frequently, you don't need it. `itertools` can do wonders, and at other times the output length can be predicted (accurately) from the input. –  Sep 18 '11 at 10:20
  • 2
    yes, I mean avoiding the obvious `len(list(generator_function()))` – Jonathan Livni Sep 18 '11 at 10:20
  • Explain why *"as part of the calculation in the new generator it needs to know the length of the old one"*, that's evil and we can probably eliminate that. [itertools](http://docs.python.org/library/itertools.html) has a bunch of constructs for that. – smci Sep 18 '11 at 10:46
  • 1
    e.g. the old generator produces a certain random function and the new generator performs a calculation that depends on the current time and on the length of the vector. I don't see how this use would be evil. Trust me that I have a need for this and that it's architecturally sound in my system. – Jonathan Livni Sep 18 '11 at 11:15
  • It doesn't sound like a generator is what you need. Even in the simple case, a generator may be producing a series ad infinitum, which is very likely with PRNGs. The only knowledge you can get is the length of the series produced by the generator *so far*. – Michael Foukarakis Sep 18 '11 at 11:19
  • I understand this is not an intrinsic property of generators, however I _am_ looking for an elegant way to add this functionality in my particular case – Jonathan Livni Sep 18 '11 at 12:46
  • Note, that generators are only usable during the first iteration. Subsequent iterations over the same generator object yield no elements. – Maxim Egorushkin Sep 18 '11 at 13:04
  • NOT A DUPLICATE. In this case the generated **items are needed** and we know how many of them there are. The [linked question](http://stackoverflow.com/questions/393053/length-of-generator-output) is about the (less common) case when _all_ we care about is the length and the items can be tossed. So the title of this question should be **How to len(generator()) when the items and their quantity are knowable and needed**. – Bob Stein Apr 06 '16 at 18:16
  • `(1 for _ in generator_function()` looks ok to me. – Martin Mar 26 '21 at 10:33

8 Answers8

336

The conversion to list that's been suggested in the other answers is the best way if you still want to process the generator elements afterwards, but has one flaw: It uses O(n) memory. You can count the elements in a generator without using that much memory with:

sum(1 for x in generator)

Of course, be aware that this might be slower than len(list(generator)) in common Python implementations, and if the generators are long enough for the memory complexity to matter, the operation would take quite some time. Still, I personally prefer this solution as it describes what I want to get, and it doesn't give me anything extra that's not required (such as a list of all the elements).

Also listen to delnan's advice: If you're discarding the output of the generator it is very likely that there is a way to calculate the number of elements without running it, or by counting them in another manner.

sschuberth
  • 28,386
  • 6
  • 101
  • 146
Rosh Oxymoron
  • 20,355
  • 6
  • 41
  • 43
  • 67
    This is the best answer imo. However, it would be slightly more pythonic to write: sum(1 for _ in generator) – RussellStewart Oct 25 '13 at 18:17
  • 4
    As a Python noob I'm probably missing something obvious, but what's the point in using this approach if you cannot use the generator afterwards anymore to generate the actual values, as you already used it to count the values (and generators are fire-once AFAIU)? – sschuberth Feb 07 '17 at 15:05
  • 7
    @sschuberth: You are right. If you need both the length and the values (and you don't control the origin of the generator), turning it into the list is the best option. – Evert Heylen May 31 '17 at 00:48
  • 3
    Found this in scikit-learn's source :) https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/preprocessing/data.py#L1314 – victorlin Nov 21 '17 at 23:07
  • 2
    @sschuberth if I'm writing tests and want to assert the number of things something returns, but don't care how or what they are – OrangeDog Aug 17 '18 at 16:15
  • 1
    Came here looking for the answer that took the memory considerations of list(generator) into account. Thanks! – Graham Lea Dec 04 '19 at 10:11
  • 2
    Why do you think that this could be slower than `len(list(generator))`? – johk95 May 28 '21 at 00:07
  • This is better of course memory-size, but it still can't help in cases of gens containing a huge number of elements. That is, it might take eons to find the sum. (I have such a case.) – Apostolos Aug 22 '22 at 08:43
72

Generators have no length, they aren't collections after all.

Generators are functions with a internal state (and fancy syntax). You can repeatedly call them to get a sequence of values, so you can use them in loop. But they don't contain any elements, so asking for the length of a generator is like asking for the length of a function.

if functions in Python are objects, couldn't I assign the length to a variable of this object that would be accessible to the new generator?

Functions are objects, but you cannot assign new attributes to them. The reason is probably to keep such a basic object as efficient as possible.

You can however simply return (generator, length) pairs from your functions or wrap the generator in a simple object like this:

class GeneratorLen(object):
    def __init__(self, gen, length):
        self.gen = gen
        self.length = length

    def __len__(self): 
        return self.length

    def __iter__(self):
        return self.gen

g = some_generator()
h = GeneratorLen(g, 1)
print len(h), list(h)
Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
  • 149
    No, it's pretty obvious that it *is* meaningful to talk about the length of many generators, since many generators return a finite number of elements. Your argument that it is meaningless proves too much; if we accept it, then by the same logic it's also meaningless to convert the output of a generator to a list... and yet `list(generator())` works and is built into the language. – Mark Amery Jul 21 '15 at 10:40
  • 1
    @MarkAmery Part of what makes generators so flexible is that they _don't_ have to provide a `__len__` method (or a Java-like `hasNext` and `remove`, or ...). What would [`itertools.count`](https://docs.python.org/3/library/itertools.html#itertools.count) return? There's no "infinity" integer in Python. And what about generators that _don't know_ when they'll be done? To write an efficient `__len__` method for a [Goldbach's Conjecture](https://en.wikipedia.org/wiki/Goldbach%27s_conjecture) generator, you'd first have to answer one of the biggest open questions in mathematics. – Kevin J. Chase Mar 15 '16 at 00:08
  • 7
    @KevinJ.Chase I could just as well ask *"What would `sum(itertools.count())` return?"*, yet `sum` can take generators. There's an obvious possible way to implement `len()` on arbitrary iterables: have it iterate and count how many elements there are. I'd argue that this would be an unhelpful feature to have (knowing the length of a consumed generator whose elements you've discarded isn't going to be useful in most cases), but the fact that it would loop forever on infinite generators plainly isn't the knock-down argument you think it is because `sum()` and `list()` have the same behaviour. – Mark Amery Mar 16 '16 at 11:02
  • 4
    ...and to "return the length of a consumed generator whose elements youve discarded" **is** sometimes useful: As the outermost consumer of a generator chain, you just want to report the number of elements the inner generators operated on (e.g. how many database records were written). – Jonathan Hartley Jun 10 '16 at 12:12
  • 2
    @Roch Oxymoron's answer is exactly what I was looking for. – Jonathan Hartley Jun 10 '16 at 12:39
  • You can, in fact, assign attributes to a function. The standard library even exploits this: see the `__wrapped__` attribute added by `functools.wraps`. – Karl Knechtel Jul 03 '22 at 06:50
20

Suppose we have a generator:

def gen():
    for i in range(10):
        yield i

We can wrap the generator, along with the known length, in an object:

import itertools
class LenGen(object):
    def __init__(self,gen,length):
        self.gen=gen
        self.length=length
    def __call__(self):
        return itertools.islice(self.gen(),self.length)
    def __len__(self):
        return self.length

lgen=LenGen(gen,10)

Instances of LenGen are generators themselves, since calling them returns an iterator.

Now we can use the lgen generator in place of gen, and access len(lgen) as well:

def new_gen():
    for i in lgen():
        yield float(i)/len(lgen)

for i in new_gen():
    print(i)
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • you solved it, but with a class. I... didn't expect that :) Is there any advantage in trying to keep the design as a function? – Jonathan Livni Sep 18 '11 at 12:50
  • 2
    @Jonathan: My first attempt was to attach an attribute to the generator object, `gen()`. Unlike with functions, however, Python does not allow you to attach additional attributes to generator objects. Because of this restriction, I went with a class. – unutbu Sep 18 '11 at 12:55
  • Clearly won't work for things you don't know the length of... for example traversing hierarchies of objects with selection/filtering. @oxymoron's answer works for that case. – qneill Jun 24 '21 at 03:25
18

You can use len(list(generator_function()). However, this consumes the generator, but that's the only way you can find out how many elements are generated. So you may want to save the list somewhere if you also want to use the items.

a = list(generator_function())
print(len(a))
print(a[0])
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 2
    The catch is, in addition to consuming the generator (which is, obviously, necessary, unless the generator explicitly provides support for `.__len__()`), this **temporarily stores ALL the generator's items in RAM at once**. – JamesTheAwesomeDude Sep 10 '20 at 07:03
10

You can len(list(generator)) but you could probably make something more efficient if you really intend to discard the results.

Ben Jackson
  • 90,079
  • 9
  • 98
  • 150
8

You can use reduce.

For Python 3:

>>> import functools
>>> def gen():
...     yield 1
...     yield 2
...     yield 3
...
>>> functools.reduce(lambda x,y: x + 1, gen(), 0)

In Python 2, reduce is in the global namespace so the import is unnecessary.

hwiechers
  • 14,583
  • 8
  • 53
  • 62
  • This is basically reinventing `sum(1 for _ in gen())`, but significantly slower (because a function call is needed for every increment, and it can't benefit from `sum`'s optimizations to sum with C level integers until it gets too big for C level types). For a 10K input, on my 3.9.5 install, this takes over 2.1x as long; the differences are less for small inputs, but `sum` always wins. – ShadowRanger Sep 30 '21 at 22:12
8

You can use send as a hack:

def counter():
    length = 10
    i = 0
    while i < length:
        val = (yield i)
        if val == 'length':
            yield length
        i += 1

it = counter()
print(it.next())
#0
print(it.next())
#1
print(it.send('length'))
#10
print(it.next())
#2
print(it.next())
#3
cyborg
  • 9,989
  • 4
  • 38
  • 56
  • This code doesn't work for me (python 3.6). If I do `it.next()` I get `AttributeError: 'generator' object has no attribute 'next'`. `next(it)` works, though. – bli Feb 04 '17 at 10:04
  • 1
    @bli: They changed the iterator method name from `.next()` on Py2 to `.__next__()` on Py3 to make it consistent with other special method names (that are all of the form `__METHODNAME__`). Your solution of calling the top-level builtin `next(it)` works on both Py2 and Py3, so it's definitely the way to go (both for portable code and for avoiding directly invoking special methods, which is typically bad form). – ShadowRanger Sep 30 '21 at 22:18
5

You can combine the benefits of generators with the certainty of len(), by creating your own iterable object:

class MyIterable(object):
    def __init__(self, n):
        self.n = n

    def __len__(self):
        return self.n

    def __iter__(self):
        self._gen = self._generator()
        return self

    def _generator(self):
        # Put your generator code here
        i = 0
        while i < self.n:
            yield i
            i += 1

    def next(self):
        return next(self._gen)

mi = MyIterable(100)
print len(mi)
for i in mi:
    print i,

This is basically a simple implementation of xrange, which returns an object you can take the len of, but doesn't create an explicit list.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • you solved it, but with a `class`. I... didn't expect that :) Is there any advantage in trying to keep the design as a function? – Jonathan Livni Sep 18 '11 at 12:50
  • 1
    That suffers from a little bug: The iterator you create restarts every time you call `iter` on it *or* on the original iterable. It will have less surprising behaviour if you rename `_generator` to `__iter__` and remove `next`. Your iterator won't have a length, but that's not an issue since the iterable will be. (Another fix is to call `self._generator` during `__init__` and *not* during `__iter__`.) – Rosh Oxymoron Sep 18 '11 at 13:36