Is there an efficient way to know how many elements are in an iterator in Python, in general, without iterating through each and counting?
-
related: [Length of a finite generator](http://stackoverflow.com/q/18014437/4279) – jfs Jun 23 '15 at 20:21
20 Answers
This code should work:
>>> iter = (i for i in range(50))
>>> sum(1 for _ in iter)
50
Although it does iterate through each item and count them, it is the fastest way to do so.
It also works for when the iterator has no item:
>>> sum(1 for _ in range(0))
0
Of course, it runs forever for an infinite input, so remember that iterators can be infinite:
>>> sum(1 for _ in itertools.count())
[nothing happens, forever]
Also, be aware that the iterator will be exhausted by doing this, and further attempts to use it will see no elements. That's an unavoidable consequence of the Python iterator design. If you want to keep the elements, you'll have to store them in a list or something.

- 260,549
- 28
- 431
- 505

- 61,037
- 23
- 50
- 66
-
24Looks to me like this does exactly what OP doesn't want to do: iterate through the iterator and count. – Adam Crossland Jul 27 '10 at 16:38
-
47This is a space-efficient way of counting the elements in an iterable – Captain Lepton Apr 16 '12 at 12:32
-
13While this isn't what OP wants, given that his question doesn't have an answer, this answer avoids instantiation of a list, and it is empirically faster by a constant than the reduce method listed above. – Phillip Nordwall Aug 02 '12 at 15:27
-
6
-
22@AloisMahdal No. It is conventional in Python to use the name `_` for a dummy variable whose value you don't care about. – Taymon Sep 30 '13 at 03:46
-
@Taymon and @AloisMahdal: `_` should be equally expensive as writing `x` or some other variable, because it does get assigned the current item returned from the generator. Try `a, _ = (0, 1); print(_)`. – 0 _ Aug 29 '16 at 09:11
-
Note that all calls to `sum(1 for _ in iter)` after the first one will return zero, unless you find some way to "rewind" the iterator each time. – coccoinomane Feb 21 '17 at 16:55
-
1@AloisMahdal Using `_` will also help annoying syntax warnings in Pycharm if you try using `x` or some other character. – Evan Zamir Feb 22 '17 at 00:17
-
1@PhillipNordwall *given that his question doesn't have an answer* - their question does have answer, the answer is *no*. – Alois Mahdal Feb 24 '17 at 23:58
-
This function may be an infinite loop for example for `def foo(): while True: yield 1`. This is the halting problem, and no function can do this. – Gulzar Jan 16 '22 at 15:58
No. It's not possible.
Example:
import random
def gen(n):
for i in xrange(n):
if random.randint(0, 1) == 0:
yield i
iterator = gen(10)
Length of iterator
is unknown until you iterate through it.

- 11,170
- 6
- 47
- 62
-
18Alternately, `def gen(): yield random.randint(0, 1)` is infinite, so you will never be able to find a length by iterating through it. – tgray Jul 27 '10 at 17:15
-
2So, to validate the obvious: the best way to get the "size" of an iterator is simply to count the number of times you've gone through the iteration, right? In this case, it'd be `numIters = 0 ; while iterator: numIters +=1`? – Mike Williamson Mar 11 '14 at 23:33
-
-
No, any method will require you to resolve every result. You can do
iter_length = len(list(iterable))
but running that on an infinite iterator will of course never return. It also will consume the iterator and it will need to be reset if you want to use the contents.
Telling us what real problem you're trying to solve might help us find you a better way to accomplish your actual goal.
Edit: Using list()
will read the whole iterable into memory at once, which may be undesirable. Another way is to do
sum(1 for _ in iterable)
as another person posted. That will avoid keeping it in memory.
-
the problem is I am reading a file with "pysam" that has millions of entries. Pysam returns an iterator. To compute a certain quantity, I need to know how many reads are in the file, but I don't need to read each one... that's the issue. – Jul 27 '10 at 16:42
-
9I'm not pysam user, but It's probably reading file "lazy". It make sense because you don't want to have big file in memory. So if you must know no. of records before iteration, only way is create two iterators, and use first one to count elements and second one to read file. BTW. Don't use `len(list(iterable))` it will load all data to memory. You can use: `reduce(lambda x, _: x+1, iterable, 0)`. Edit: Zonda333 code with sum is also good. – Tomasz Wysocki Jul 27 '10 at 16:48
-
1@user248237: why do you say you need to know how many entries are available to compute a certain quantity ? You could just read a fixed amount of them and manage the case when there is less than that fixed amount (really simple to do using iterslice). Is there another reason you have to read all entries ? – kriss Jul 27 '10 at 16:59
-
1@Tomasz Note that reduce is deprecated, and will be gone in Python 3 and up. – Wilduck Jul 27 '10 at 18:39
-
8
-
One of putting this is iterators store the means to get values, not values, so by definition they have no length, BUT you may have a range in some cases e.g. you can iterate through the values in permutations. – Aaron Newton Apr 21 '12 at 04:31
-
I tried this version of iterator length vs. others using this iPython notebook. It is the fastest by a small margin (compared to len(tuple(iterator))) https://gist.github.com/NelsonMinar/90212fbbfc6465c8e263341b86aa01a8 – rhaps0dy May 03 '17 at 10:15
You cannot (except the type of a particular iterator implements some specific methods that make it possible).
Generally, you may count iterator items only by consuming the iterator. One of probably the most efficient ways:
import itertools
from collections import deque
def count_iter_items(iterable):
"""
Consume an iterable not reading it into memory; return the number of items.
"""
counter = itertools.count()
deque(itertools.izip(iterable, counter), maxlen=0) # (consume at C speed)
return next(counter)
(For Python 3.x replace itertools.izip
with zip
).

- 441
- 4
- 3
-
4+1: in a time comparison with `sum(1 for _ in iterator)`, this was almost twice as fast. – augustomen May 13 '14 at 18:56
-
1It's more accurate to say that it consumes an iterable by reading each item into memory and discarding it right away. – Rockallite Feb 22 '19 at 06:59
-
It's important to note (which I overlooked) that the _order of arguments to `zip` matters_: if you pass `zip(counter, iterable)`, you'll actually get 1 more than the iterable count! – Kye W Shi Jul 26 '19 at 05:08
-
-
Kinda. You could check the __length_hint__
method, but be warned that (at least up to Python 3.4, as gsnedders helpfully points out) it's a undocumented implementation detail (following message in thread), that could very well vanish or summon nasal demons instead.
Otherwise, no. Iterators are just an object that only expose the next()
method. You can call it as many times as required and they may or may not eventually raise StopIteration
. Luckily, this behaviour is most of the time transparent to the coder. :)

- 11,409
- 3
- 61
- 89
-
5This is no longer the case, as of [PEP 424](http://legacy.python.org/dev/peps/pep-0424/) and Python 3.4. `__length_hint__` is now documented, but it is *a hint* and makes no guarantees of accuracy. – gsnedders Jul 18 '14 at 11:47
A quick benchmark:
import collections
import itertools
def count_iter_items(iterable):
counter = itertools.count()
collections.deque(itertools.izip(iterable, counter), maxlen=0)
return next(counter)
def count_lencheck(iterable):
if hasattr(iterable, '__len__'):
return len(iterable)
d = collections.deque(enumerate(iterable, 1), maxlen=1)
return d[0][0] if d else 0
def count_sum(iterable):
return sum(1 for _ in iterable)
iter = lambda y: (x for x in xrange(y))
%timeit count_iter_items(iter(1000))
%timeit count_lencheck(iter(1000))
%timeit count_sum(iter(1000))
The results:
10000 loops, best of 3: 37.2 µs per loop
10000 loops, best of 3: 47.6 µs per loop
10000 loops, best of 3: 61 µs per loop
I.e. the simple count_iter_items is the way to go.
Adjusting this for python3:
61.9 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
74.4 µs ± 190 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
82.6 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

- 7,316
- 1
- 37
- 63
I like the cardinality package for this, it is very lightweight and tries to use the fastest possible implementation available depending on the iterable.
Usage:
>>> import cardinality
>>> cardinality.count([1, 2, 3])
3
>>> cardinality.count(i for i in range(500))
500
>>> def gen():
... yield 'hello'
... yield 'world'
>>> cardinality.count(gen())
2
The actual count()
implementation is as follows:
def count(iterable):
if hasattr(iterable, '__len__'):
return len(iterable)
d = collections.deque(enumerate(iterable, 1), maxlen=1)
return d[0][0] if d else 0

- 18,076
- 9
- 88
- 126
-
I assume you can still iterate over the iterator if you use that function, yes? – jcollum Dec 19 '19 at 19:41
-
@jcollum Looking at the code given for `count` at the end of this answer, the iterable is consumed if it doesn't have a `.__len__` attribute. If it's a "one-use" object such as a generator, then it will be empty after calling `count` on it. – Stef Nov 16 '21 at 09:20
So, for those who would like to know the summary of that discussion. The final top scores for counting a 50 million-lengthed generator expression using:
len(list(gen))
,len([_ for _ in gen])
,sum(1 for _ in gen),
ilen(gen)
(from more_itertool),reduce(lambda c, i: c + 1, gen, 0)
,
sorted by performance of execution (including memory consumption), will make you surprised:
```
1: test_list.py:8: 0.492 KiB
gen = (i for i in data*1000); t0 = monotonic(); len(list(gen))
('list, sec', 1.9684218849870376)
2: test_list_compr.py:8: 0.867 KiB
gen = (i for i in data*1000); t0 = monotonic(); len([i for i in gen])
('list_compr, sec', 2.5885991149989422)
3: test_sum.py:8: 0.859 KiB
gen = (i for i in data*1000); t0 = monotonic(); sum(1 for i in gen); t1 = monotonic()
('sum, sec', 3.441088170016883)
4: more_itertools/more.py:413: 1.266 KiB
d = deque(enumerate(iterable, 1), maxlen=1)
test_ilen.py:10: 0.875 KiB
gen = (i for i in data*1000); t0 = monotonic(); ilen(gen)
('ilen, sec', 9.812256851990242)
5: test_reduce.py:8: 0.859 KiB
gen = (i for i in data*1000); t0 = monotonic(); reduce(lambda counter, i: counter + 1, gen, 0)
('reduce, sec', 13.436614598002052) ```
So, len(list(gen))
is the most frequent and less memory consumable

- 2,172
- 1
- 19
- 20
-
1
-
3Can you explain why `len(list(gen))` should consume less memory than the approach based on reduce? The former creates a new `list` that involves memory allocation while the latter should not. So I'd expect the latter to be more memory efficient. Also, memory consumption will depend on the element type. – normanius Nov 11 '19 at 14:06
-
FYI: I can reproduce for python 3.6.8 (on a MacBookPro) that method 1 outperforms the other methods in terms of runtime (I skipped method 4). – normanius Nov 11 '19 at 14:21
-
2`len(tuple(iterable))` can be even more efficient: [article by Nelson Minar](https://nelsonslog.wordpress.com/2016/04/06/python3-no-len-for-iterators/) – VMAtm Nov 14 '19 at 09:36
-
Could you please specify which numbers represent a time, and in which unit? I see KiB in your answer which is obviously for memory consumption, but I don't see any numbers in seconds or any other time unit. – Stef Sep 03 '21 at 22:59
-
@Stef in tuples - ('list, sec', 1.9684218849870376) etc., after each code line – Alex-Bogdanov Nov 15 '21 at 21:13
-
1@normanius There's also no way that making a list from a "50 million-lengthed generator expression" only takes a few hundred bytes. Obviously wrong. Bad answer. – Kelly Bundy Mar 27 '23 at 16:21
An iterator is just an object which has a pointer to the next object to be read by some kind of buffer or stream, it's like a LinkedList where you don't know how many things you have until you iterate through them. Iterators are meant to be efficient because all they do is tell you what is next by references instead of using indexing (but as you saw you lose the ability to see how many entries are next).

- 22,940
- 10
- 58
- 88
-
2An iterator is nothing like a linked list. An object returned from an iterator does not point to the next object, and these objects are not (necessarily) stored in memory. Rather, it can yield object one after the other, based on whatever inner logic (which could be, but does not have to be, based on a stored list). – Tom May 30 '13 at 20:09
-
1@Tom I was using LinkedList as an example mostly in that you don't know how much you have since you only know what's next in a sense (if there is something). I apologize if my wording seems a little off or if I implied that they are one in the same. – Jesus Ramos May 30 '13 at 20:27
Regarding your original question, the answer is still that there is no way in general to know the length of an iterator in Python.
Given that you question is motivated by an application of the pysam library, I can give a more specific answer: I'm a contributer to PySAM and the definitive answer is that SAM/BAM files do not provide an exact count of aligned reads. Nor is this information easily available from a BAM index file. The best one can do is to estimate the approximate number of alignments by using the location of the file pointer after reading a number of alignments and extrapolating based on the total size of the file. This is enough to implement a progress bar, but not a method of counting alignments in constant time.

- 616
- 3
- 7
There are two ways to get the length of "something" on a computer.
The first way is to store a count - this requires anything that touches the file/data to modify it (or a class that only exposes interfaces -- but it boils down to the same thing).
The other way is to iterate over it and count how big it is.

- 49,299
- 29
- 200
- 290
I thought it could be worthwhile to have a micro-benchmark comparing the run-times of the different approaches mentioned here.
Disclaimer: I'm using simple_benchmark
(a library written by me) for the benchmarks and also include iteration_utilities.count_items
(a function in a third-party-library written by me).
To provide a more differentiated result I've done two benchmarks, one only including the approaches that don't build an intermediate container just to throw it away and one including these:
from simple_benchmark import BenchmarkBuilder
import more_itertools as mi
import iteration_utilities as iu
b1 = BenchmarkBuilder()
b2 = BenchmarkBuilder()
@b1.add_function()
@b2.add_function()
def summation(it):
return sum(1 for _ in it)
@b1.add_function()
def len_list(it):
return len(list(it))
@b1.add_function()
def len_listcomp(it):
return len([_ for _ in it])
@b1.add_function()
@b2.add_function()
def more_itertools_ilen(it):
return mi.ilen(it)
@b1.add_function()
@b2.add_function()
def iteration_utilities_count_items(it):
return iu.count_items(it)
@b1.add_arguments('length')
@b2.add_arguments('length')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
yield size, [0]*size
r1 = b1.run()
r2 = b2.run()
import matplotlib.pyplot as plt
f, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=[15, 18])
r1.plot(ax=ax2)
r2.plot(ax=ax1)
plt.savefig('result.png')
The results were:
It uses log-log-axis so that all ranges (small values, large values) can be inspected. Since the plots are intended for qualitative comparison the actual values aren't too interesting. In general the y-axis (vertical) represents the time and the x-axis (horizontal) represents the number of elements in the input "iterable". Lower on the vertical axis means faster.
The upper plot shows the approaches where no intermediate list was used. Which shows that the iteration_utilities
approach was fastest, followed by more_itertools
and the slowest was using sum(1 for _ in iterator)
.
The lower plot also included the approaches that used len()
on an intermediate list, once with list
and once with a list comprehension. The approach with len(list)
was fastest here, but the difference to the iteration_utilities
approach is almost negligible. The approach using the comprehension was significantly slower than using list
directly.
Summary
Any approach mentioned here did show a dependency on the length of the input and iterated over ever element in the iterable. There is no way to get the length without the iteration (even if the iteration is hidden).
If you don't want third-party extensions then using len(list(iterable))
is definitely the fastest approach of the tested approaches, it however generates an intermediate list which could use significant more memory.
If you don't mind additional packages then iteration_utilities.count_items
would be almost as fast as the len(list(...))
function but doesn't require additional memory.
However it's important to note that the micro-benchmark used a list as input. The result of the benchmark could be different depending on the iterable you want to get the length of. I also tested with range
and a simple genertor-expression and the trends were very similar, however I cannot exclude that the timing won't change depending on the type of input.

- 145,886
- 38
- 333
- 352
-
*"the micro-benchmark used a list as input"* - Then `len_list` had an inappropriate advantage, as it cheats: It doesn't use an iterator (what the question is about!) but directly copies the pointers from the source list's internal data. I'd use an `itertools.repeat` iterator instead. – Kelly Bundy Mar 27 '23 at 18:42
-
Running your code as-is, `len_list` is about 1.4 times faster than `iteration_utilities_count_items` for me. Using `repeat` iterators instead, it's about 2 times slower than it. – Kelly Bundy Mar 27 '23 at 18:57
I decided to re-run benchmarks on this on a modern version of Python and found a near-complete inversion of the benchmarks by
I ran the following commands:
py -m timeit -n 10000000 -s "it = iter(range(1000000))" -s "from collections import deque" -s "from itertools import count" -s "def itlen(x):" -s " return len(tuple(x))" -- "itlen(it)"
py -m timeit -n 10000000 -s "it = iter(range(1000000))" -s "from collections import deque" -s "from itertools import count" -s "def itlen(x):" -s " return len(list(x))" -- "itlen(it)"
py -m timeit -n 10000000 -s "it = iter(range(1000000))" -s "from collections import deque" -s "from itertools import count" -s "def itlen(x):" -s " return sum(map(lambda i: 1, x))" -- "itlen(it)"
py -m timeit -n 10000000 -s "it = iter(range(1000000))" -s "from collections import deque" -s "from itertools import count" -s "def itlen(x):" -s " return sum(1 for _ in x)" -- "itlen(it)"
py -m timeit -n 10000000 -s "it = iter(range(1000000))" -s "from collections import deque" -s "from itertools import count" -s "def itlen(x):" -s " d = deque(enumerate(x, 1), maxlen=1)" -s " return d[0][0] if d else 0" -- "itlen(it)"
py -m timeit -n 10000000 -s "it = iter(range(1000000))" -s "from collections import deque" -s "from itertools import count" -s "def itlen(x):" -s " counter = count()" -s " deque(zip(x, counter), maxlen=0)" -s " return next(counter)" -- "itlen(it)"
Which are equivalent to timing each of the the following itlen*(it)
functions:
it = iter(range(1000000))
from collections import deque
from itertools import count
def itlen1(x):
return len(tuple(x))
def itlen2(x):
return len(list(x))
def itlen3(x):
return sum(map(lambda i: 1, x))
def itlen4(x):
return sum(1 for _ in x)
def itlen5(x):
d = deque(enumerate(x, 1), maxlen=1)
return d[0][0] if d else 0
def itlen6(x):
counter = count()
deque(zip(x, counter), maxlen=0)
return next(counter)
On a Windows 11, Python 3.11 machine with an AMD Ryzen 7 5800H and 16 GB RAM, I got the following output:
10000000 loops, best of 5: 103 nsec per loop
10000000 loops, best of 5: 107 nsec per loop
10000000 loops, best of 5: 138 nsec per loop
10000000 loops, best of 5: 164 nsec per loop
10000000 loops, best of 5: 338 nsec per loop
10000000 loops, best of 5: 425 nsec per loop
Which suggests that len(list(x))
and len(tuple(x))
are tied; followed by sum(map(lambda i: 1, x))
; then closely by sum(1 for _ in x)
; then the other more complicated methods mentioned by other answers and/or used in cardinality are at least a factor of two slower still.

- 1,363
- 16
- 30
-
You should've been shocked that they iterated over a million numbers in less than a microsecond. That's of course not what happened. – Kelly Bundy Mar 27 '23 at 20:52
This is against the very definition of an iterator, which is a pointer to an object, plus information about how to get to the next object.
An iterator does not know how many more times it will be able to iterate until terminating. This could be infinite, so infinity might be your answer.

- 71
- 1
- 3
-
1It's not violating anything, and there is nothing wrong applying prior knowledge when using an iterator. There are zillions of iterators around, where you know, that the number of elements is limited. Think about simply filtering a list, you can easily give the maximum length, you just don't really know how many of the elements actually fit your filter condition. Wanting to know the number of matching elements is a valid application, not violating any mysterical idea of an iterator. – Michael Jun 12 '17 at 13:17
It's common practice to put this type of information in the file header, and for pysam to give you access to this. I don't know the format, but have you checked the API?
As others have said, you can't know the length from the iterator.

- 67,082
- 10
- 127
- 137
Although it's not possible in general to do what's been asked, it's still often useful to have a count of how many items were iterated over after having iterated over them. For that, you can use jaraco.itertools.Counter or similar. Here's an example using Python 3 and rwt to load the package.
$ rwt -q jaraco.itertools -- -q
>>> import jaraco.itertools
>>> items = jaraco.itertools.Counter(range(100))
>>> _ = list(counted)
>>> items.count
100
>>> import random
>>> def gen(n):
... for i in range(n):
... if random.randint(0, 1) == 0:
... yield i
...
>>> items = jaraco.itertools.Counter(gen(100))
>>> _ = list(counted)
>>> items.count
48

- 41,115
- 10
- 83
- 93
One simple way is using built-in function set()
or list()
:
A: set()
in case of having no duplicated item in the iterator (fastest way)
iter = zip([1,2,3],['a','b','c'])
print(len(set(iter)) # set(iter) = {(1, 'a'), (2, 'b'), (3, 'c')}
Out[45]: 3
or
iter = range(1,10)
print(len(set(iter)) # set(iter) = {1, 2, 3, 4, 5, 6, 7, 8, 9}
Out[47]: 9
B: list()
in case of having duplicated item in the iterator
iter = (1,2,1,2,1,2,1,2)
print(len(list(iter)) # list(iter) = [1, 2, 1, 2, 1, 2, 1, 2]
Out[49]: 8
# compare with set function
print(len(set(iter)) # set(iter) = {1, 2}
Out[51]: 2

- 2,752
- 2
- 26
- 19
-
This only works if you have unique items in the iterable. If the same value is yielded twice, this will undercount. – AbyxDev Dec 25 '22 at 17:30
-
1@AbyxDev Post updated for the assumption of duplicated items in iterator – imanzabet Dec 26 '22 at 16:11
This is theoretically impossible: this is, in fact, the Halting Problem.
Proof
Assume in contradiction it were possible to determine the length (or infinite length) of any generator g
, using a function len(g)
.
For any program P
, let us now convert P
into a generator g(P)
:
For every return or exit point in P
, yield a value instead of returning it.
If len(g(P)) == infinity
, P doesn't stop.
This solves the Halting Problem, which is known to be impossible, see Wikipedia. Contradiction.
Thus, it is impossible to count the elements of a generic generator without iterating over it (==actually running through the program).
More concretely, consider
def g():
while True:
yield "more?"
The length is infinite. There are infinitely many such generators.

- 23,452
- 27
- 113
- 201
def count_iter(iter):
sum = 0
for _ in iter: sum += 1
return sum

- 161,647
- 65
- 194
- 231
-
It was written explicitely in the question that iterating is not a valid option. This will also infinite loop for infinite generators. – Gulzar Jan 16 '22 at 16:42
Presumably, you want count the number of items without iterating through, so that the iterator is not exhausted, and you use it again later. This is possible with copy
or deepcopy
import copy
def get_iter_len(iterator):
return sum(1 for _ in copy.copy(iterator))
###############################################
iterator = range(0, 10)
print(get_iter_len(iterator))
if len(tuple(iterator)) > 1:
print("Finding the length did not exhaust the iterator!")
else:
print("oh no! it's all gone")
The output is "Finding the length did not exhaust the iterator!
"
Optionally (and unadvisedly), you can shadow the built-in len
function as follows:
import copy
def len(obj, *, len=len):
try:
if hasattr(obj, "__len__"):
r = len(obj)
elif hasattr(obj, "__next__"):
r = sum(1 for _ in copy.copy(obj))
else:
r = len(obj)
finally:
pass
return r

- 4,290
- 2
- 20
- 42
-
2Ranges aren't iterators. There are some iterator types that can be copied, but others will cause this code to fail with a TypeError (e.g. generators), and iterating through a copied iterator may cause side effects to happen twice, or cause arbitrary breakage in code that, say, returned a `map` iterator expecting the resulting function calls to only happen once. – user2357112 Nov 09 '19 at 02:39