174

Python provides a nice method for getting length of an eager iterable, len(x) that is. But I couldn't find anything similar for lazy iterables represented by generator comprehensions and functions. Of course, it is not hard to write something like:

def iterlen(x):
  n = 0
  try:
    while True:
      next(x)
      n += 1
  except StopIteration: pass
  return n

But I can't get rid of a feeling that I'm reimplementing a bicycle.

(While I was typing the function, a thought struck my mind: maybe there really is no such function, because it "destroys" its argument. Not an issue for my case, though).

P.S.: concerning the first answers - yes, something like len(list(x)) would work too, but that drastically increases the usage of memory.

P.P.S.: re-checked... Disregard the P.S., seems I made a mistake while trying that, it works fine. Sorry for the trouble.

  • Suggest title change to **Length of generator output ONLY -- the iterated items can be tossed**. Otherwise this question is confused with [another](http://stackoverflow.com/q/7460836/673991). – Bob Stein Apr 06 '16 at 18:19
  • 4
    `reimplementing a bicycle` - almost like reinventing the wheel, only a programmer said it. – Cullub Dec 05 '16 at 17:11

9 Answers9

306

The easiest way is probably just sum(1 for _ in gen) where gen is your generator.

Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
Matt Dunham
  • 3,069
  • 2
  • 14
  • 2
  • 13
    As much as I like this solution, the major downside here is that it's not at all obvious by reading the code what you're trying to achieve. If I saw this line in someone else's code, I'd pause to think "why is he taking the sum here?" - unless I'd seen this "hack" before. – Charles Salvia Aug 09 '12 at 13:47
  • 26
    @CharlesSalvia that's what comments are for imho. Getting the length of a generator is comment-worthy I'd say. – Niels Bom Jun 05 '13 at 15:35
  • 47
    Another major downside is that it exhausts the generator just to get the length, which usually defeats the whole purpose of generators in the first place. – ely Sep 08 '14 at 15:57
  • 5
    Note that this might be less memory consuming but it seems to be slower than simply converting it to a list. – lumbric May 16 '15 at 08:23
  • 6
    Arguably, `len(list(gen))` is clearer, and according to the below answer, is more efficient – Anish Gupta Aug 13 '20 at 21:32
  • 1
    This clear the gen and I cannot use it since it got emptied – alper Nov 17 '20 at 18:15
100

For those who would like to know the summary of that discussion. The final top scores for counting a 50 million-lengthed generator expression using:

  • len(list(gen)),
  • len([_ for _ in gen]),
  • sum(1 for _ in gen),
  • ilen(gen) (from more_itertools),
  • reduce(lambda c, i: c + 1, gen, 0),

sorted by performance of execution (including memory consumption), will make you surprised:

#1: test_list.py:8: 0.492 KiB
    gen = (i for i in data*1000); t0 = monotonic(); len(list(gen))
('list, sec', 1.9684218849870376)

#2: test_list_compr.py:8: 0.867 KiB
    gen = (i for i in data*1000); t0 = monotonic(); len([i for i in gen])
('list_compr, sec', 2.5885991149989422)

#3: test_sum.py:8: 0.859 KiB
    gen = (i for i in data*1000); t0 = monotonic(); sum(1 for i in gen); t1 = monotonic()
('sum, sec', 3.441088170016883)

#4: more_itertools/more.py:413: 1.266 KiB
    d = deque(enumerate(iterable, 1), maxlen=1)
   
    test_ilen.py:10: 0.875 KiB
    gen = (i for i in data*1000); t0 = monotonic(); ilen(gen)
('ilen, sec', 9.812256851990242)

#5: test_reduce.py:8: 0.859 KiB
    gen = (i for i in data*1000); t0 = monotonic(); reduce(lambda counter, i: counter + 1, gen, 0)
('reduce, sec', 13.436614598002052)

So, len(list(gen)) is the most frequent and less memory consumable

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Alex-Bogdanov
  • 2,172
  • 1
  • 19
  • 20
  • 5
    Personally I found the len list method to take twice as long as the sum method. So results may vary. – steveayre Sep 03 '20 at 18:49
  • FYI, `more_itertools` improved their implementation based on [my improved version of their code that uses a `maxlen=0` `deque` to trigger a hyper-optimized consume of the input](https://stackoverflow.com/a/34404546/364696); it's still slower than `len(list(gen))` when the `list` doesn't grow so large as to cause swap thrashing, but it only takes about 50% longer, and for inputs of meaningful size, it takes about half as long as `sum(1 for _ in gen)`. – ShadowRanger Sep 30 '21 at 22:26
42

There isn't one because you can't do it in the general case - what if you have a lazy infinite generator? For example:

def fib():
    a, b = 0, 1
    while True:
        a, b = b, a + b
        yield a

This never terminates but will generate the Fibonacci numbers. You can get as many Fibonacci numbers as you want by calling next().

If you really need to know the number of items there are, then you can't iterate through them linearly one time anyway, so just use a different data structure such as a regular list.

Smi
  • 13,850
  • 9
  • 56
  • 64
Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • 103
    I'm not sure I believe/accept the explanation. `sum` takes an iterable, even though that iterable might be infinite and hence "you can't do it in the general case" any more than you can do len in the general case. Perhaps a more likely rationale is that people "expect" `len` to be O(1), which it isn't for a general iterable? – Steve Jessop Aug 18 '10 at 15:12
  • 14
    Regular lists consume more memory, which is something the OP wants to avoid. – akaihola May 11 '11 at 12:32
  • @Steve Jessop: If you have many objects, counting them is obviously O(n) in general. If you keep track of the number of objects while collecting them, it is O(1). For many special cases you might be able to use the objects nature to make up a better algorithm (i.e. counting grains of rice by weighing them). Memory consumption can be used to count objects if they are lined up in memory. But for generators there is no such method in general. – lumbric May 16 '15 at 08:30
  • 1
    I have a filtered list I expect to be on the order of 2000000000 elements. I can't just use a regular list; I _need_ to use a generator. Now, because of how these elements are being sourced, I can actually run through them pretty efficiently -- I just can't store them because I don't have 40 gigs of memory. This answer is utterly, completely useless for me. – Nic Feb 23 '19 at 02:25
19
def count(iter):
    return sum(1 for _ in iter)

Or better yet:

def count(iter):
    try:
        return len(iter)
    except TypeError:
        return sum(1 for _ in iter)

If it's not iterable, it will throw a TypeError.

Or, if you want to count something specific in the generator:

def count(iter, key=None):
    if key:
        if callable(key):
            return sum(bool(key(x)) for x in iter)
        return sum(x == key for x in iter)
    try:
        return len(iter)
    except TypeError:
        return sum(1 for _ in iter)
mpen
  • 272,448
  • 266
  • 850
  • 1,236
8

By definition, only a subset of generators will return after a certain number of arguments (have a pre-defined length), and even then, only a subset of these finite generators have a predictable end (accessing the generator can have side-effects which could stop the generator earlier).

If you wish to implement length methods for your generator, you have to first define what you consider the "length" (is it the total number of elements? the number of remaining elements?), then wrap your generator in a class. Here's an example:

class MyFib(object):
    """
    A class iterator that iterates through values of the
    Fibonacci sequence, until, optionally, a maximum length is reached.
    """

    def __init__(self, length):
        self._length = length
        self._i = 0

     def __iter__(self):
        a, b = 0, 1
        while not self._length or self._i < self._length:
            a, b = b, a + b
            self._i += 1
            yield a

    def __len__(self):
        "This method returns the total number of elements"
        if self._length:
            return self._length
        else:
            raise NotImplementedError("Infinite sequence has no length")
            # or simply return None / 0 depending
            # on implementation

Here is how to use it:

In [151]: mf = MyFib(20)

In [152]: len(mf)
Out[152]: 20

In [153]: l = [n for n in mf]

In [154]: len(l)
Out[154]: 20

In [155]: l
Out[155]: 
[1,
 1,
 2,
...
6765]


In [156]: mf0 = MyFib(0)

In [157]: len(mf0)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-157-2e89b32ad3e4> in <module>()
----> 1 len(mf0)

/tmp/ipython_edit_TWcV1I.py in __len__(self)
     22             return self._length
     23         else:
---> 24             raise NotImplementedError
     25             # or simply return None / 0 depending
     26             # on implementation

NotImplementedError: 

In [158]: g = iter(mf0)

In [159]: l0 = [g.next(), g.next(), g.next()]

In [160]: l0
Out[160]: [1, 1, 2]
sleblanc
  • 3,821
  • 1
  • 34
  • 42
  • This is a solution to implement an iterator/generator which can provide a length to the `len()` function. You can derive your generator from this class by implementing your own `__iter__` method, and if required, your own `__init__` and `__len__` method. This pattern could be useful e.g. for some ORM-type object, where you execute a SQL query, then fetch results row-by-row using a cursor (via the iterator), and the `__len__` method gets the count from the actual SQL query. – sleblanc Mar 28 '13 at 19:54
8

You can use enumerate() to loop through the generated data stream, then return the last number -- the number of items.

I tried to use itertools.count() with itertools.izip() but no luck. This is the best/shortest answer I've come up with:

#!/usr/bin/python

import itertools

def func():
    for i in 'yummy beer':
        yield i

def icount(ifunc):
    size = -1 # for the case of an empty iterator
    for size, _ in enumerate(ifunc()):
        pass
    return size + 1

print list(func())
print 'icount', icount(func)

# ['y', 'u', 'm', 'm', 'y', ' ', 'b', 'e', 'e', 'r']
# icount 10

Kamil Kisiel's solution is way better:

def count_iterable(i):
    return sum(1 for e in i)
glglgl
  • 89,107
  • 13
  • 149
  • 217
6

Use reduce(function, iterable[, initializer]) for a memory efficient purely functional solution:

>>> iter = "This string has 30 characters."
>>> reduce(lambda acc, e: acc + 1, iter, 0)
30
OlivierBlanvillain
  • 7,701
  • 4
  • 32
  • 51
  • Your timings are off because the iterator is being consumed. Only the first try at `len(list(iter))` is actually iterating over any values, all the others are counting a zero-length sequence. In my testing, `reduce` is slower than `len(list())`, `enumerate` and `sum`. – Blckknght Sep 05 '13 at 07:50
5

Try the more_itertools package for a simple solution. Example:

>>> import more_itertools

>>> it = iter("abcde")                                         # sample generator
>>> it
<str_iterator at 0x4ab3630>

>>> more_itertools.ilen(it)
5

See this post for another applied example.

Community
  • 1
  • 1
pylang
  • 40,867
  • 14
  • 129
  • 121
1

This is a hack, but if you really want to have len work on a general iterable (consuming it in the way), you can create your own version of len.

The len function is essentially equivalent to the following (though implementations usually provide some optimizations to avoid the extra lookup):

def len(iterable):
    return iterable.__len__()

Therefore we can define our new_len to try that, and if __len__ does not exist, count the number of elements ourselves by consuming the iterable:

def new_len(iterable):
    try:
      return iterable.__len__()
    except AttributeError:
      return sum(1 for _ in iterable)

The above works in Python 2/3, and (as far as I know) should cover every conceivable kind of iterable.

yoniLavi
  • 2,624
  • 1
  • 24
  • 30
  • 2
    overriding a built-in function will mask the original behaviour, which leads to hard (or impossible) to debug code. you should really use a different name for the-function-that-must-not-be-named-len... – wouter bolsterlee Jun 10 '17 at 20:31