8

Suppose I have the following function:

def print_twice(x):
    for i in x: print(i)
    for i in x: print(i)

When I run:

print_twice([1,2,3])

or:

print_twice((1,2,3))

I get the expected result: the numbers 1,2,3 are printed twice.

But when I run:

print_twice(zip([1,2,3],[4,5,6]))

the pairs (1,4),(2,5),(3,6) are printed only once. Probably, this is because the zip returns a generator that terminates after one pass.

How can I modify the function print_twice such that it will correctly handle all inputs?

I could insert a line at the beginning of the function: x = list(x). But this might be inefficient in case x is already a list, a tuple, a range, or any other iterator that can be iterated more than once. Is there a more efficient solution?

Erel Segal-Halevi
  • 33,955
  • 36
  • 114
  • 183
  • Does this look like it helps? https://stackoverflow.com/q/6416538/5763413 – blackbrandt Dec 16 '21 at 15:31
  • 2
    Hi erel, you could check if the Argument is of type`iterator` and if it is use `itertools.tee()`. Please have a look at [this post](https://stackoverflow.com/questions/1271320/resetting-generator-object-in-python) – Jonathan Weine Dec 16 '21 at 15:34
  • @JonathanWeine is `iterator` the only thing that is exhausted? (i.e., if it is not an iterator, can I just iterate over it twice?) – Erel Segal-Halevi Dec 17 '21 at 00:39
  • Does this answer your question? [Why can't I iterate twice over the same data?](https://stackoverflow.com/questions/25336726/why-cant-i-iterate-twice-over-the-same-data) (use `iter` to ensure you can use `tee`, see also https://stackoverflow.com/questions/5933966/is-it-possible-to-convert-a-list-type-into-a-generator-without-iterating-through). – mkrieger1 Dec 17 '21 at 08:43

4 Answers4

8

A simple test to see if x will be consumed when you iterate over it is iter(x) is x. This is reliable, since it's specified as part of the iterator protocol (docs):

Iterators are required to have an __iter__() method that returns the iterator object itself

Conversely, if iter(x) returns x itself then x must be an iterator, since it was returned by the iter function.

Some checks:

def is_iterator(x):
    return iter(x) is x

for obj in [
    # not iterators
    [1, 2, 3],
    (1, 2, 3),
    {1: 2, 3: 4},
    range(3),
    # iterators
    (x for x in range(3)),
    iter([1, 2, 3]),
    zip([1, 2], [3, 4]),
    filter(lambda x: x % 2 == 0, [1, 2, 3]),
    map(lambda x: 2 * x, [1, 2, 3]),
]:
    name = type(obj).__name__
    if is_iterator(obj):
        print(name, 'is an iterator')
    else:
        print(name, 'is not an iterator')

Results:

list is not an iterator
tuple is not an iterator
dict is not an iterator
range is not an iterator
generator is an iterator
list_iterator is an iterator
zip is an iterator
filter is an iterator
map is an iterator

So, to ensure that x can be iterated multiple times, without making an unnecessary copy if it already can be, you can write something like:

if iter(x) is x:
    x = list(x)
kaya3
  • 47,440
  • 4
  • 68
  • 97
  • This explains how to test if something is an iterator, but how does it answer the question how to iterate twice over an iterator? – mkrieger1 Dec 17 '21 at 16:09
  • @mkrieger1 OP already knows that `x = list(x)` will ensure that `x` can be iterated over twice, as far as I can tell the question is how to only do that when necessary. – kaya3 Dec 17 '21 at 16:47
  • `x = itertools.repeat(None)` fulfills `iter(x) is x`, but I challenge you to exhaust it :-P – Kelly Bundy Dec 19 '21 at 12:24
  • @KellyBundy Fair, but in response I challenge you to iterate over it twice :-p – kaya3 Dec 19 '21 at 12:39
3

I could insert a line at the beginning of the function: x = list(x). But this might be inefficient in case x is already a list, a tuple, a range, or any other iterator that can be iterated more than once. Is there a more efficient solution?

Copying single-use iterables to a list is perfectly adequate, and reasonably efficient even for multi-use iterables.

The list (and to some extend tuple) type is one of the most optimised data structures in Python. Common operations such as copying a list or tuple to a list are internally optimised;1 even for iterables that are not special-cased, copying them to a list is significantly faster than any realistic work done by two (or more) loops.

def print_twice(x):
    x = list(x)
    for i in x: print(i)
    for i in x: print(i)

Copying indiscriminately can also be advantageous in the context of concurrency, when the iterable may be modified while the function is running. Common cases are threading and weakref collections.


In case one wants to avoid needless copies, checking whether the iterable is a Collection is a reasonable guard.

from collections.abc import Collection

x = list(x) if not isinstance(x, Collection) else x

Alternatively, one can check whether the iterable is in fact an iterator, since this implies statefulness and thus single-use.

from collections.abc import Iterator

x = list(x) if isinstance(x, Iterator) else x
x = list(x) if iter(x) is x else x

Notably, the builtins zip, filter, map, ... and generators all are iterators.


1Copying a list of 128 items is roughly as fast as checking whether it is a Collection.

MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
  • Regarding the first point: what if the list is very long (e.g. millions of items): is it still efficient to copy to another list? – Erel Segal-Halevi Dec 17 '21 at 10:02
  • 1
    @ErelSegal-Halevi In practice yes. Copying a list is a tight C-loop copying pointers and incrementing reference counts; the memory requirement is a precisely allocated pointer array. While both can still be a lot for millions of items, you have to weight that against your function *operating* on these millions of items anyway. Remember that you only need this if you are going to run over the entire list at least twice – adding a third run will in the *absolute worst case of doing absolutely nothing with the data* add 50% runtime. – MisterMiyagi Dec 17 '21 at 10:11
1

zip will return an iterator. Once unpacked, it cannot be unpacked again, it gets exhausted.

Maybe if you want to make sure that only zip objects get converted to list as you said it would work but it would not be efficient, you can check for it type:

if isinstance(x, zip):
  x = list(x)
Alexandru DuDu
  • 998
  • 1
  • 7
  • 19
  • 1
    It's better to use `isinstance(x, zip)` instead of `type(x) == zip`. See: https://stackoverflow.com/questions/1549801/what-are-the-differences-between-type-and-isinstance – exciteabletom Dec 16 '21 at 15:45
  • 1
    `zip` is only an example. What if it is a different thing that gets exhausted? – Erel Segal-Halevi Dec 17 '21 at 00:28
  • In this case maybe this post can help you: https://stackoverflow.com/questions/7976269/how-can-i-get-generators-iterators-to-evaluate-as-false-when-exhausted but as I understand from it, it seems like you have to provide yourself the solution. So I guess, instead of checking most of the possible types and create a list when it is necessarry, you can simply create a list everytime. This will be useless if the object is already an iterator/list/etc but it's easier and at the end it doesn't add too much workload for the compiler. – Alexandru DuDu Dec 17 '21 at 07:42
  • @AlexandruDuDu It is not much workload for the compiler, but if the list is very long, it might be a lot of work for the computer. – Erel Segal-Halevi Dec 17 '21 at 09:59
-4

Modify your print_twice function

def print_twice(x):
    val = x
    for i in range(2):
        for i in val: print(i)
itprorh66
  • 3,110
  • 4
  • 9
  • 21