What is the purpose of Python's itertools.repeat?

Question

For every use I can think of for Python's itertools.repeat() class, I can think of another equally (possibly more) acceptable solution to achieve the same effect. For example:

>>> [i for i in itertools.repeat('example', 5)]
['example', 'example', 'example', 'example', 'example']
>>> ['example'] * 5
['example', 'example', 'example', 'example', 'example']

>>> list(map(str.upper, itertools.repeat('example', 5)))
['EXAMPLE', 'EXAMPLE', 'EXAMPLE', 'EXAMPLE', 'EXAMPLE']
>>> ['example'.upper()] * 5
['EXAMPLE', 'EXAMPLE', 'EXAMPLE', 'EXAMPLE', 'EXAMPLE']

Is there any case in which itertools.repeat() would be the most appropriate solution? If so, under what circumstances?

I added a new answer that shows the original motivating use case for itertools repeat. Also, I've just updated the Python docs to reflect this usage note. — Raymond Hettinger, Feb 01 '12 at 17:11
3 of your 4 code examples won't actually work. The first one creates a generator expression, not a `tuple` (you'd want `tuple(itertools.repeat('example', 5))`), the second multiplies `'example'` itself to make `'exampleexampleexampleexampleexample'` because `('example')` doesn't make a `tuple` in the first place (you need `('example',) * 5`), and your third example uses `map`, which would return a `map` object, because Python 3 `map` is lazy (you'd have to wrap it in `list` to get the provided result). It's an interesting question, but faking your code samples hurts it. — ShadowRanger, Dec 26 '18 at 14:48
@ShadowRanger, I was pretty new to Python when I made this post and I just quickly typed up some examples without checking the actual output. A little pedantic, but I've fixed it now anyway. Thanks! :) — Tyler Crompton, Jan 03 '19 at 19:00

Raymond Hettinger · Answer 1 · 2017-03-26T05:07:36.583

43

The primary purpose of itertools.repeat is to supply a stream of constant values to be used with map or zip:

>>> list(map(pow, range(10), repeat(2)))     # list of squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The secondary purpose is that it gives a very fast way to loop a fixed number of times like this:

for _ in itertools.repeat(None, 10000):
    do_something()

This is faster than:

for i in range(10000):
    do_something().

The former wins because all it needs to do is update the reference count for the existing None object. The latter loses because the range() or xrange() needs to manufacture 10,000 distinct integer objects.

Note, Guido himself uses that fast looping technique in the timeit() module. See the source at https://hg.python.org/cpython/file/2.7/Lib/timeit.py#l195 :

    if itertools:
        it = itertools.repeat(None, number)
    else:
        it = [None] * number
    gcold = gc.isenabled()
    gc.disable()
    try:
        timing = self.inner(it, self.timer)

edited Mar 26 '17 at 05:07

answered Feb 01 '12 at 15:51

Raymond Hettinger

216,523
63
388
485

3

This answer and `repeat` is a treasure. Why is this hidden in `itertools` and not a built-in? `for _ in range(x): do()` is such a common pattern. – Darkonaut Feb 18 '19 at 22:54
@Darkonaut Your implicit assumption is that Python was designed to be fast. It wasn't. It was designed to be easy to read. – Veky Apr 05 '21 at 05:10
@Veky You seem to confuse Python the language with one of its implementations (CPython). The reference count Raymond mentions isn't part of the language. A language itself doesn't have the notion of speed. – Darkonaut Apr 05 '21 at 05:40
I can't see why you would think so. But I can assure you that when Guido designed Python, he designed a very concrete implementation. (Yes, CPython. Other implementations came along almost 10 years after.) And in that language, Pythonic way to loop n times was to use range, not repeat. Even when range returned a list, it was preferred (xrange came later). So obviously speed was not a primary concern. I didn't mention reference counting at all. – Veky Apr 06 '21 at 14:05
2

@Veky My assumption wasn't "Python was designed to be fast", nor would it have to be in any meaning only to provide a nicer idiom than `for _ in range(x): do()` for the case I don't care about the sequence, but having something fast_er_ for tight loops would be nice. My initial comment was about interpreter and library, yours about language design and made it sound like using anything other than `range()` would somehow sacrifice readability for speed. Python's readability stems from syntax mostly, not from what functions come along with an interpreter or how they are implemented. – Darkonaut Apr 09 '21 at 07:05
@Veky Anyway, I don't think there's much room for target conflicts between readability and performance beyond the one time decision for a high-level language which doesn't allow fine grained control over memory usage. For the CPython interpreter, performance _is_ and always was of concern. For example you'll find performance related improvements with every CPython update (e.g. [floor division](https://bugs.python.org/issue39434) in Python 3.9). Also noteworthy, Guido opposes dropping the GIL as long as it would result in degraded performance(!) for single-threaded execution. – Darkonaut Apr 09 '21 at 07:07
I don't think this discussion can serve any useful purpose anymore. I have access to the same set of facts you quote, only I interpret them completely differently. Oh well. It happens sometimes. – Veky Apr 09 '21 at 11:18

score 32 · Accepted Answer · edited Dec 26 '18 at 14:45

The itertools.repeat function is lazy; it only uses the memory required for one item. On the other hand, the (a,) * n and [a] * n idioms create n copies of the object in memory. For five items, the multiplication idiom is probably better, but you might notice a resource problem if you had to repeat something, say, a million times.

Still, it is hard to imagine many static uses for itertools.repeat. However, the fact that itertools.repeat is a function allows you to use it in many functional applications. For example, you might have some library function func which operates on an iterable of input. Sometimes, you might have pre-constructed lists of various items. Other times, you may just want to operate on a uniform list. If the list is big, itertools.repeat will save you memory.

Finally, repeat makes possible the so-called "iterator algebra" described in the itertools documentation. Even the itertools module itself uses the repeat function. For example, the following code is given as an equivalent implementation of itertools.izip_longest (even though the real code is probably written in C). Note the use of repeat seven lines from the bottom:

class ZipExhausted(Exception):
    pass

def izip_longest(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    counter = [len(args) - 1]
    def sentinel():
        if not counter[0]:
            raise ZipExhausted
        counter[0] -= 1
        yield fillvalue
    fillers = repeat(fillvalue)
    iterators = [chain(it, sentinel(), fillers) for it in args]
    try:
        while iterators:
            yield tuple(map(next, iterators))
    except ZipExhausted:
        pass

Minor quibble: `[a] * n` does not create n copies of a in memory. It creates n references to a single copy of a. In some cases the difference can be quite significant; try `a = [[]] * 5; a[0].append(1)`. — Thomas K, Jan 30 '12 at 12:49
Good point. I keep forgetting that almost everything in Python is a reference. I guess that also abates the memory usage problem somewhat, but I'd guess a million references still has a nontrivial resource requirement. — HardlyKnowEm, Jan 30 '12 at 15:22

score 16 · Answer 3 · answered Jan 30 '12 at 04:08

16

Your example of foo * 5 looks superficially similar to itertools.repeat(foo, 5), but it is actually quite different.

If you write foo * 100000, the interpreter must create 100,000 copies of foo before it can give you an answer. It is thus a very expensive and memory-unfriendly operation.

But if you write itertools.repeat(foo, 100000), the interpreter can return an iterator that serves the same function, and doesn't need to compute a result until you need it -- say, by using it in a function that wants to know each result in the sequence.

That's the major advantage of iterators: they can defer the computation of a part (or all) of a list until you really need the answer.

answered Jan 30 '12 at 04:08

John Feminella

303,634
46
339
357

Why not just use `for i in range(100000):` and then access `foo` inside the loop instead of asking this function what value you gave it? – Tyler Crompton Jan 30 '12 at 04:16
@TylerCrompton: The iterator can be passed to other things that expect any kind of iterator, without regard for its interior contents. You can't do the same with a range (it is iterable, but is not itself an iterator). – John Feminella Jan 30 '12 at 04:21
I see your point, but as far as the end of your comment goes, in Python 3? – Tyler Crompton Jan 30 '12 at 04:23
1

`range` is an iterator in Python 3, but in Python 2, it returns a list. In Python 2, use `xrange` for an iterator; in Python 3, use `list(range(...))` for a list. – HardlyKnowEm Jan 30 '12 at 05:19
Sorry, I didn't see that this question was tagged Python-3. Yes, @mlefavor is correct. – John Feminella Jan 30 '12 at 05:26
@HardlyKnowEm: Pedantic: Py3 `range` and Py2 `xrange` are lazy, but they aren't actually iterators themselves. They're itera*bles*, not itera*tors*. They're immutable sequences (slightly crippled on Python 2 `xrange`, but fairly complete on Python 3), just ones that compute their contents on demand. It makes a difference when you iterate the same one twice; `r = range(10)` (`xrange` on Py2) followed by `sum(r)` then `sum(r)` again will produce the same result each time; if it was an iterator, the second call would produce `0` (because the first call would exhaust the iterator). – ShadowRanger Dec 26 '18 at 14:54

score 3 · Answer 4 · answered Nov 27 '13 at 08:34

3

As mentioned before, it works well with zip:

Another example:

from itertools import repeat

fruits = ['apples', 'oranges', 'bananas']

# Initialize inventory to zero for each fruit type.
inventory = dict( zip(fruits, repeat(0)) )

Result:

{'apples': 0, 'oranges': 0, 'bananas': 0}

To do this without repeat, I'd have to involve len(fruits).

answered Nov 27 '13 at 08:34

Jonathon Reinhart

132,704
33
254
328

3

`inventory = {fruit: 0 for fruit in fruits}` is more readable and slightly faster. – Tyler Crompton Nov 27 '13 at 09:46
@TylerCrompton Indeed. I'm not sure that I've used that syntax before to initialize a dictionary. Or I've just been using too much LINQ :-) Thanks for the informative comment. – Jonathon Reinhart Nov 27 '13 at 23:00
1

@TylerCrompton: If we're going for speed, `dict.fromkeys(fruits, 0)` is the fastest (not for only three items with a constant value, due to slightly higher fixed overhead, but as the number of items in `fruits` increases, `dict.fromkeys` pulls ahead, starting around eight items); asymptotically on my machine, it runs in about 2/3rd the time of the `dict` comprehension for huge inputs. As of 3.6 (with guaranteed ordering for `dict`s), `dict.fromkeys(x)` is a really efficient way to uniquify inputs while preserving ordering (unlike `set(x)`, which loses ordering). – ShadowRanger Dec 26 '18 at 15:05

score 3 · Answer 5 · answered Jan 30 '12 at 04:09

3

It's an iterator. Big clue here: it's in the itertools module. From the documentation you linked to:

itertools.repeat(object[, times]) Make an iterator that returns object over and over again. Runs indefinitely unless the times argument is specified.

So you won't ever have all that stuff in memory. An example where you want to use it might be

n = 25
t = 0
for x in itertools.repeat(4):
    if t > n:
        print t
    else:
        t += x

as this will allow you an arbitrary number of 4s, or whatever you might need an infinite list of.

answered Jan 30 '12 at 04:09

machine yearning

9,889
5
38
51

3

You could change line 3 to `while True:` and the `x` on line 7 to `4` and it would do the same exact thing, would be more readable, and would be slightly faster. This is why I was wondering if it had any purpose. – Tyler Crompton Jan 30 '12 at 07:33
1

@TylerCrompton: Note: Amusingly, on Python 2, `while True:` would be slower than `for x in itertools.repeat(4):`, because `True` wasn't a keyword back then, so `while True:` actually loaded it and tested it for truthiness on each loop to be sure no one had reassigned it (`while 1:` was a true unconditionally infinite loop). `repeat` kept the iterator on the stack (no lookup in the built-ins scope) and saved that work. Thankfully, on Python 3 `True` and `False` are keywords, and `while True:` really is an unconditionally infinite loop at the byte code level. – ShadowRanger Dec 26 '18 at 15:10

score 0 · Answer 6 · answered Feb 14 '17 at 00:37

I usually use repeat in conjunction with chain and cycle. Here is an example:

from itertools import chain,repeat,cycle

fruits = ['apples', 'oranges', 'bananas', 'pineapples','grapes',"berries"]

inventory = list(zip(fruits, chain(repeat(10,2),cycle(range(1,3)))))

print inventory

Puts the first 2 fruits as value 10, then it cycles the values 1 and 2 for the remaining fruits.

What is the purpose of Python's itertools.repeat?

6 Answers6

Linked