685

How can I create an iterator in Python?

For example, suppose I have a class whose instances logically "contain" some values:

class Example:
    def __init__(self, values):
        self.values = values

I want to be able to write code like:

e = Example([1, 2, 3])
# Each time through the loop, expose one of the values from e.values
for value in e:
    print("The example object contains", value)

More generally, the iterator should be able to control where the values come from, or even compute them on the fly (rather than considering any particular attribute of the instance).

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
akdom
  • 32,264
  • 27
  • 73
  • 79

10 Answers10

753

Iterator objects in python conform to the iterator protocol, which basically means they provide two methods: __iter__() and __next__().

  • The __iter__ returns the iterator object and is implicitly called at the start of loops.

  • The __next__() method returns the next value and is implicitly called at each loop increment. This method raises a StopIteration exception when there are no more value to return, which is implicitly captured by looping constructs to stop iterating.

Here's a simple example of a counter:

class Counter:
    def __init__(self, low, high):
        self.current = low - 1
        self.high = high

    def __iter__(self):
        return self

    def __next__(self): # Python 2: def next(self)
        self.current += 1
        if self.current < self.high:
            return self.current
        raise StopIteration


for c in Counter(3, 9):
    print(c)

This will print:

3
4
5
6
7
8

This is easier to write using a generator, as covered in a previous answer:

def counter(low, high):
    current = low
    while current < high:
        yield current
        current += 1

for c in counter(3, 9):
    print(c)

The printed output will be the same. Under the hood, the generator object supports the iterator protocol and does something roughly similar to the class Counter.

David Mertz's article, Iterators and Simple Generators, is a pretty good introduction.

pambda
  • 2,930
  • 2
  • 22
  • 32
ars
  • 120,335
  • 23
  • 147
  • 134
  • 8
    This is mostly a good answer, but the fact that it returns self is a little sub-optimal. For example, if you used the same counter object in a doubly nested for loop you would probably not get the behavior that you meant. – Casey Rodarmor Feb 06 '14 at 23:33
  • 37
    No, iterators SHOULD return themselves. Iterables return iterators, but iterables shouldn't implement `__next__`. `counter` is an iterator, but it is not a sequence. It doesn't store its values. You shouldn't be using the counter in a doubly-nested for-loop, for example. – leewz Feb 21 '14 at 08:42
  • 5
    In the Counter example, self.current should be assigned in `__iter__` (in addition to in `__init__`). Otherwise, the object can be iterated only once. E.g., if you say `ctr = Counters(3, 8)`, then you cannot use `for c in ctr` more than once. – Curt Apr 05 '16 at 23:00
  • shouldn't the \_\_iter\_\_ code be setting the value of self.current? – kdubs Mar 13 '17 at 02:01
  • 10
    @Curt: Absolutely not. `Counter` is an iterator, and iterators are only supposed to be iterated once. If you reset `self.current` in `__iter__`, then a nested loop over the `Counter` would be completely broken, and all sorts of assumed behaviors of iterators (that calling `iter` on them is idempotent) are violated. If you want to be able to iterate `ctr` more than once, it needs to be a non-iterator iterable, where it returns a brand new iterator each time `__iter__` is invoked. Trying to mix and match (an iterator that is implicitly reset when `__iter__` is invoked) violates the protocols. – ShadowRanger Feb 24 '18 at 01:16
  • 3
    For example, if `Counter` was to be a non-iterator iterable, you'd remove the definition of `__next__`/`next` entirely, and probably redefine `__iter__` as a generator function of the same form as the generator described at the end of this answer (except instead of the bounds coming from arguments to `__iter__`, they'd be arguments to `__init__` saved on `self` and accessed from `self` in `__iter__`). – ShadowRanger Feb 24 '18 at 01:19
  • 1
    BTW, a useful thing to do if you want to write portable iterator classes is to define either `next` or `__next__`, then assign one name to the other (`next = __next__` or `__next__ = next` depending on the name you gave the method). Having both names defined means it works on both Py2 and Py3 without source code changes. – ShadowRanger Feb 24 '18 at 01:43
  • Thanks for the answer. To clarify an ambiguity: `__iter__()` gets called *once* before entering the looping construct. *"... at the start of loops"* suggests `__iter__` gets called at the beginning of each loop in the same looping construct, which is false. A double nested for loop using `Counter` will show that `__iter__` gets called once each time and before the nested for-loop executes. – Minh Tran May 19 '18 at 21:24
530

There are four ways to build an iterative function:

Examples:

# generator
def uc_gen(text):
    for char in text.upper():
        yield char

# generator expression
def uc_genexp(text):
    return (char for char in text.upper())

# iterator protocol
class uc_iter():
    def __init__(self, text):
        self.text = text.upper()
        self.index = 0
    def __iter__(self):
        return self
    def __next__(self):
        try:
            result = self.text[self.index]
        except IndexError:
            raise StopIteration
        self.index += 1
        return result

# getitem method
class uc_getitem():
    def __init__(self, text):
        self.text = text.upper()
    def __getitem__(self, index):
        return self.text[index]

To see all four methods in action:

for iterator in uc_gen, uc_genexp, uc_iter, uc_getitem:
    for ch in iterator('abcde'):
        print(ch, end=' ')
    print()

Which results in:

A B C D E
A B C D E
A B C D E
A B C D E

Note:

The two generator types (uc_gen and uc_genexp) cannot be reversed(); the plain iterator (uc_iter) would need the __reversed__ magic method (which, according to the docs, must return a new iterator, but returning self works (at least in CPython)); and the getitem iteratable (uc_getitem) must have the __len__ magic method:

    # for uc_iter we add __reversed__ and update __next__
    def __reversed__(self):
        self.index = -1
        return self
    def __next__(self):
        try:
            result = self.text[self.index]
        except IndexError:
            raise StopIteration
        self.index += -1 if self.index < 0 else +1
        return result

    # for uc_getitem
    def __len__(self)
        return len(self.text)

To answer Colonel Panic's secondary question about an infinite lazily evaluated iterator, here are those examples, using each of the four methods above:

# generator
def even_gen():
    result = 0
    while True:
        yield result
        result += 2


# generator expression
def even_genexp():
    return (num for num in even_gen())  # or even_iter or even_getitem
                                        # not much value under these circumstances

# iterator protocol
class even_iter():
    def __init__(self):
        self.value = 0
    def __iter__(self):
        return self
    def __next__(self):
        next_value = self.value
        self.value += 2
        return next_value

# getitem method
class even_getitem():
    def __getitem__(self, index):
        return index * 2

import random
for iterator in even_gen, even_genexp, even_iter, even_getitem:
    limit = random.randint(15, 30)
    count = 0
    for even in iterator():
        print even,
        count += 1
        if count >= limit:
            break
    print

Which results in (at least for my sample run):

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

How to choose which one to use? This is mostly a matter of taste. The two methods I see most often are generators and the iterator protocol, as well as a hybrid (__iter__ returning a generator).

Generator expressions are useful for replacing list comprehensions (they are lazy and so can save on resources).

If one needs compatibility with earlier Python 2.x versions use __getitem__.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • 4
    I like this summary because it is complete. Those three ways (yield, generator expression and iterator) are essentially the same, although some are more convenient than others. The yield operator captures the "continuation" which contains the state (for example the index that we are up to). The information is saved in the "closure" of the continuation. The iterator way saves the same information inside the fields of the iterator, which is essentially the same thing as a closure. The __getitem__ method is a little different because it indexes into the contents and is not iterative in nature. – Ian Jul 05 '13 at 01:04
  • You aren't incrementing the index in your last approach, `uc_getitem()` . Actually on reflection, it shouldnt increment the index, because it is not maintaining it. But it also is not a way to abstract iteration. – Terrence Brannon Nov 05 '13 at 15:25
  • 2
    @metaperl: Actually, it is. In all four of the above cases you can use the same code to iterate. – Ethan Furman Nov 05 '13 at 16:37
  • @EthanFurman I am not an expert, but should there not be a reset of the index in the uc_iter class? I.e. inside __iter__ method set self.index to 0 so that the next invocation of the iterator works – Asterisk Apr 19 '18 at 09:30
  • 1
    @Asterisk: No, an instance of `uc_iter` should expire when it's done (otherwise it would by infinite); if you want to do it again you have to get a new iterator by calling `uc_iter()` again. – Ethan Furman Apr 19 '18 at 16:13
  • @TerrenceBrannon uc_getitem() works, but perhaps can be considered an option for backwards compatibility. [It stops when an IndexError is raised](https://stackoverflow.com/a/926645/1048186). – Josiah Yoder Jul 27 '18 at 14:35
  • 3
    You can set `self.index = 0` in `__iter__` so that you can iterate many times over. Otherwise you can't. – John Strood Aug 14 '18 at 08:26
  • 1
    If you could spare the time I would appreciate an explanation for why you would choose any of the methods over the others. – aaaaaa Jan 21 '19 at 04:15
  • 1
    @JohnStrood Absolutely not, because it would violate the iterator protocol. Iterators are expected to just return themselves in `__iter__()` so `iter(iterator_instance)` doesn't change the state of the given iterator instance. – BlackJack Jun 27 '19 at 14:02
  • 1
    @JohnStrood to iterate over a complex `MyClass` object more than once, create a `MyClassIterator` class with just `__init__` and `__next__` and return an instance of that from `MyClass.__iter__` (e.g. `return MyClassIterator(self)`) instead of just `self`, so that `MyClassIterator` can store a reference to the `MyClass` instance as well as the current data index, making it safe to be called multiple times at once while still satisfying the other iterator protocol issues. – simpleuser Aug 27 '19 at 20:40
  • 1
    @aaaaaa: "How to choose" section added (at the bottom). – Ethan Furman Aug 28 '19 at 15:57
  • @VaradhanWork: Thanks for the suggestion! In the future, please use comments to point out flaws instead of making large edits. – Ethan Furman Feb 12 '20 at 17:30
  • @EthanFurman Can you please explain why we do this after adding __reversed__: `self.index += -1 if self.index < 0 ` When would the index be less than 0? – coderWorld Apr 12 '20 at 23:33
  • 1
    @coderWorld: `__reversed__` sets `self.index` to `-1` so each call to `__next__` will reduce it one more -- this has the effect of going backwards: `'red'[-1] -> 'd', 'red'[-2] -> 'e', 'red'[-3] -> 'r', 'red'[-4] -> StopIteration` – Ethan Furman Apr 17 '20 at 02:15
  • @EthanFurman the `iter protocol` is thread safe or do I have to make any adjustments? – Jose Aug 24 '22 at 16:41
  • @Jose: No, the `iter protocol` is not thread safe. – Ethan Furman Aug 24 '22 at 18:18
  • This is not quite complete; it should also show the use of `__iter__` to return a separate iterator object, as well as implementing `__iter__` as a generator. – Karl Knechtel Aug 24 '23 at 16:55
129

I see some of you doing return self in __iter__. I just wanted to note that __iter__ itself can be a generator (thus removing the need for __next__ and raising StopIteration exceptions)

class range:
  def __init__(self,a,b):
    self.a = a
    self.b = b
  def __iter__(self):
    i = self.a
    while i < self.b:
      yield i
      i+=1

Of course here one might as well directly make a generator, but for more complex classes it can be useful.

Manux
  • 3,643
  • 4
  • 30
  • 42
  • 5
    Great! It so boring writing just `return self` in `__iter__`. When I was going to try using `yield` in it I found your code doing exactly what I want to try. – Ray Feb 05 '13 at 19:32
  • 3
    But in this case, how would one implement `next()`? `return iter(self).next()`? – Lenna Apr 05 '13 at 19:52
  • 4
    @Lenna, it is already "implemented" because iter(self) returns an iterator, not a range instance. – Manux Apr 07 '13 at 17:31
  • @Manux `iter(range(5,10)).next()` is a bit cumbersome. Admittedly a bad example for `next` behavior. I'm still interested in how to give the range instance a `next` attribute. – Lenna Apr 24 '13 at 19:06
  • 3
    This the easiest way of doing it, and doesn't involve having to keep track of e.g. ``self.current`` or any other counter. This should be the top-voted answer! – astrofrog Mar 31 '14 at 13:35
  • The difference: `__iter__` being a generator is a different object than the `range()` instance. Sometimes this matters, sometimes it doesn't. – Ethan Furman Nov 09 '14 at 19:42
  • You shouldn't be using `iter(range(5,10)).next()` anyway. The correct way is `next(iter(range(5,10)))`. The `next` builtin is there exactly so you don't have to care whether or not `self` is returned in this situation. – Mad Physicist Mar 14 '16 at 20:41
  • up-voted -- this method also works more like expected (relative to the accepted answer) for something like `r = range(5); list_of_lists = list([ri, list(r)] for ri in r)` – swinman Sep 17 '16 at 16:09
  • It's interesting that `__iter__` doesn't have to raise StopIteration. A problem with defining only `__iter__` is that `next(myiterator)` doesn't work if `__next__` does not `return` individual items. Needing to use `next(iter(myiterator))` is not a wise substitute. – Asclepius Apr 18 '17 at 14:42
  • 14
    To be clear, this approach makes your class *iterable*, but not an *iterator*. You get fresh *iterators* every time you call `iter` on instances of the class, but they're not themselves instances of the class. – ShadowRanger Feb 24 '18 at 01:25
  • 1
    @MadPhysicist: On Python 2, `iter(range(5,10)).next()` and `next(iter(range(5,10)))` are already exactly equivalent. The advantage to `next` as a function has nothing to do with whether `self` is returned by `__iter__` (the behavior is identical for both code snippets). The advantages of the `next` built-in function are: 1. It works the same on Py2 and Py3, even though the method changes names between them and 2. When applicable, it can be given a second argument to return in the event that the iterator is already exhausted, rather than raising `StopIteration`. – ShadowRanger Feb 24 '18 at 01:27
105

First of all the itertools module is incredibly useful for all sorts of cases in which an iterator would be useful, but here is all you need to create an iterator in python:

yield

Isn't that cool? Yield can be used to replace a normal return in a function. It returns the object just the same, but instead of destroying state and exiting, it saves state for when you want to execute the next iteration. Here is an example of it in action pulled directly from the itertools function list:

def count(n=0):
    while True:
        yield n
        n += 1

As stated in the functions description (it's the count() function from the itertools module...) , it produces an iterator that returns consecutive integers starting with n.

Generator expressions are a whole other can of worms (awesome worms!). They may be used in place of a List Comprehension to save memory (list comprehensions create a list in memory that is destroyed after use if not assigned to a variable, but generator expressions can create a Generator Object... which is a fancy way of saying Iterator). Here is an example of a generator expression definition:

gen = (n for n in xrange(0,11))

This is very similar to our iterator definition above except the full range is predetermined to be between 0 and 10.

I just found xrange() (suprised I hadn't seen it before...) and added it to the above example. xrange() is an iterable version of range() which has the advantage of not prebuilding the list. It would be very useful if you had a giant corpus of data to iterate over and only had so much memory to do it in.

jpp
  • 159,742
  • 34
  • 281
  • 339
akdom
  • 32,264
  • 27
  • 73
  • 79
  • 21
    as of python 3.0 there is no longer an xrange() and the new range() behaves like the old xrange() –  Dec 18 '08 at 17:30
  • 6
    You should still use xrange in 2._, because 2to3 translates it automatically. – Phob Jul 22 '11 at 18:03
15

This question is about iterable objects, not about iterators. In Python, sequences are iterable too so one way to make an iterable class is to make it behave like a sequence, i.e. give it __getitem__ and __len__ methods. I have tested this on Python 2 and 3.

class CustomRange:

    def __init__(self, low, high):
        self.low = low
        self.high = high

    def __getitem__(self, item):
        if item >= len(self):
            raise IndexError("CustomRange index out of range")
        return self.low + item

    def __len__(self):
        return self.high - self.low


cr = CustomRange(0, 10)
for i in cr:
    print(i)
aq2
  • 309
  • 3
  • 4
  • 2
    It doesn't have to have a `__len__()` method. `__getitem__` alone with the expected behaviour is sufficient. – BlackJack Jun 27 '19 at 14:05
11

If you looking for something short and simple, maybe it will be enough for you:

class A(object):
    def __init__(self, l):
        self.data = l

    def __iter__(self):
        return iter(self.data)

example of usage:

In [3]: a = A([2,3,4])

In [4]: [i for i in a]
Out[4]: [2, 3, 4]
Danil
  • 4,781
  • 1
  • 35
  • 50
6

Include the following code in your class code.

 def __iter__(self):
        for x in self.iterable:
            yield x

Make sure that you replace self.iterablewith the iterable which you iterate through.

Here's an example code

class someClass:
    def __init__(self,list):
        self.list = list
    def __iter__(self):
        for x in self.list:
            yield x


var = someClass([1,2,3,4,5])
for num in var: 
    print(num) 

Output

1
2
3
4
5

Note: Since strings are also iterable, they can also be used as an argument for the class

foo = someClass("Python")
for x in foo:
    print(x)

Output

P
y
t
h
o
n
Sreevatsan
  • 71
  • 1
  • 5
5

All answers on this page are really great for a complex object. But for those containing builtin iterable types as attributes, like str, list, set or dict, or any implementation of collections.Iterable, you can omit certain things in your class.

class Test(object):
    def __init__(self, string):
        self.string = string

    def __iter__(self):
        # since your string is already iterable
        return (ch for ch in self.string)
        # or simply
        return self.string.__iter__()
        # also
        return iter(self.string)

It can be used like:

for x in Test("abcde"):
    print(x)

# prints
# a
# b
# c
# d
# e
John Strood
  • 1,859
  • 3
  • 26
  • 39
  • 2
    As you said, the string is already iterable so why the extra generator expression in between instead of just asking the string for the iterator (which the generator expression does internally): `return iter(self.string)`. – BlackJack Jun 27 '19 at 14:07
  • @BlackJack You're indeed right. I do not know what persuaded me to write that way. Perhaps I was trying to avoid any confusion in an answer trying to explain the working of iterator syntax in terms of more iterator syntax. – John Strood Jun 27 '19 at 16:40
3

This is an iterable function without yield. It make use of the iter function and a closure which keeps it's state in a mutable (list) in the enclosing scope for python 2.

def count(low, high):
    counter = [0]
    def tmp():
        val = low + counter[0]
        if val < high:
            counter[0] += 1
            return val
        return None
    return iter(tmp, None)

For Python 3, closure state is kept in an immutable in the enclosing scope and nonlocal is used in local scope to update the state variable.

def count(low, high):
    counter = 0
    def tmp():
        nonlocal counter
        val = low + counter
        if val < high:
            counter += 1
            return val
        return None
    return iter(tmp, None)  

Test;

for i in count(1,10):
    print(i)
1
2
3
4
5
6
7
8
9
Nizam Mohamed
  • 8,751
  • 24
  • 32
  • 1
    I always appreciate a clever use of two-arg `iter`, but just to be clear: This is more complex and less efficient than just using a `yield` based generator function; Python has a ton of interpreter support for `yield` based generator functions that you can't take advantage of here, making this code significantly slower. Up-voted nonetheless. – ShadowRanger Feb 24 '18 at 01:30
1
class uc_iter():
    def __init__(self):
        self.value = 0
    def __iter__(self):
        return self
    def __next__(self):
        next_value = self.value
        self.value += 2
        return next_value

Improving previous answer, one of the advantage of using class is that you can add __call__ to return self.value or even next_value.

class uc_iter():
    def __init__(self):
        self.value = 0
    def __iter__(self):
        return self
    def __next__(self):
        next_value = self.value
        self.value += 2
        return next_value
    def __call__(self):
        next_value = self.value
        self.value += 2
        return next_value
c = uc_iter()
print([c() for _ in range(10)])
print([next(c) for _ in range(5)])
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
# [20, 22, 24, 26, 28]

Other example of a class based on Python Random that can be both called and iterated could be seen on my implementation here

Muhammad Yasirroni
  • 1,512
  • 12
  • 22