Can iterators be reset in Python?

Question

Can I reset an iterator / generator in Python? I am using DictReader and would like to reset it to the beginning of the file.

On a side note, I found that the `list()` function will iterate through its argument (an iterable). Thus calling `list()` on the same iterable twice (e.g. result of `zip()`) you will get an empty list on the second call! — dz902, Aug 08 '20 at 06:57

score 99 · Accepted Answer · answered Jul 16 '10 at 16:39

I see many answers suggesting itertools.tee, but that's ignoring one crucial warning in the docs for it:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

Basically, tee is designed for those situation where two (or more) clones of one iterator, while "getting out of sync" with each other, don't do so by much -- rather, they say in the same "vicinity" (a few items behind or ahead of each other). Not suitable for the OP's problem of "redo from the start".

L = list(DictReader(...)) on the other hand is perfectly suitable, as long as the list of dicts can fit comfortably in memory. A new "iterator from the start" (very lightweight and low-overhead) can be made at any time with iter(L), and used in part or in whole without affecting new or existing ones; other access patterns are also easily available.

As several answers rightly remarked, in the specific case of csv you can also .seek(0) the underlying file object (a rather special case). I'm not sure that's documented and guaranteed, though it does currently work; it would probably be worth considering only for truly huge csv files, in which the list I recommmend as the general approach would have too large a memory footprint.

Using `list()` to cache multipassage over a csvreader on a 5MB file sees my runtime go from ~12secs to ~0.5s. — John Mee, Oct 23 '12 at 01:33

score 39 · Answer 2 · edited Dec 30 '12 at 10:17

39

If you have a csv file named 'blah.csv' That looks like

a,b,c,d
1,2,3,4
2,3,4,5
3,4,5,6

you know that you can open the file for reading, and create a DictReader with

blah = open('blah.csv', 'r')
reader= csv.DictReader(blah)

Then, you will be able to get the next line with reader.next(), which should output

{'a':1,'b':2,'c':3,'d':4}

using it again will produce

{'a':2,'b':3,'c':4,'d':5}

However, at this point if you use blah.seek(0), the next time you call reader.next() you will get

{'a':1,'b':2,'c':3,'d':4}

again.

This seems to be the functionality you're looking for. I'm sure there are some tricks associated with this approach that I'm not aware of however. @Brian suggested simply creating another DictReader. This won't work if you're first reader is half way through reading the file, as your new reader will have unexpected keys and values from wherever you are in the file.

edited Dec 30 '12 at 10:17

Rodrigue

3,617
2
37
49

answered Jul 16 '10 at 15:24

Wilduck

13,822
10
58
90

This was what my theory told me, nice to see that what I thought should happen, does. – Wayne Werner Jul 16 '10 at 18:02
@Wilduck: the behavior you're describing with another instance of DictReader won't happen if you make a new file handle and pass that to the second DictReader, right? – Oct 24 '12 at 17:52
If you have two file handlers they will behave independently, yes. – Wilduck Oct 24 '12 at 22:52

score 31 · Answer 3 · answered Jul 16 '10 at 15:18

31

No. Python's iterator protocol is very simple, and only provides one single method (.next() or __next__()), and no method to reset an iterator in general.

The common pattern is to instead create a new iterator using the same procedure again.

If you want to "save off" an iterator so that you can go back to its beginning, you may also fork the iterator by using itertools.tee

answered Jul 16 '10 at 15:18

u0b34a0f6ae

48,117
14
92
101

1

While you're analysis of the .next() method is probably correct, there is a fairly simple way to get what the op is asking for. – Wilduck Jul 16 '10 at 15:27
2

@Wilduck: I see that your answer. I just answered the iterator question, and I have no idea about the `csv` module. Hopefully both answers are useful to the original poster. – u0b34a0f6ae Jul 16 '10 at 15:33
Strictly, the iterator protocol also requires `__iter__`. That is, iterators are required also to be iterables. – Steve Jessop Jan 22 '14 at 13:39

Developer · Answer 4 · 2013-04-23T02:15:49.787

12

Yes, if you use numpy.nditer to build your iterator.

>>> lst = [1,2,3,4,5]
>>> itr = numpy.nditer([lst])
>>> itr.next()
1
>>> itr.next()
2
>>> itr.finished
False
>>> itr.reset()
>>> itr.next()
1

edited Apr 23 '13 at 02:15

answered Dec 30 '12 at 10:06

Developer

8,258
8
49
58

Can `nditer` cycle through the array like `itertools.cycle`? – LWZ Aug 24 '13 at 18:37
1

@LWZ: I don't think so, but you can `try:` the `next()` and on a `StopIteration` exception do a `reset()`. – Dennis Williamson Jul 28 '16 at 22:07
...followed by a `next()` – Dennis Williamson Jul 28 '16 at 22:28
This is what I was looking for ! – sriram Jan 07 '19 at 01:40
1

Note that the limit of "operands" here is 32: https://stackoverflow.com/questions/51856685/python-np-nditer-valueerror-too-many-operands – Simon Jul 07 '19 at 11:36

score 12 · Answer 5 · edited Sep 16 '19 at 00:30

There's a bug in using .seek(0) as advocated by Alex Martelli and Wilduck above, namely that the next call to .next() will give you a dictionary of your header row in the form of {key1:key1, key2:key2, ...}. The work around is to follow file.seek(0) with a call to reader.next() to get rid of the header row.

So your code would look something like this:

f_in = open('myfile.csv','r')
reader = csv.DictReader(f_in)

for record in reader:
    if some_condition:
        # reset reader to first row of data on 2nd line of file
        f_in.seek(0)
        reader.next()
        continue
    do_something(record)

Anish · Answer 6 · 2016-05-11T06:26:59.607

This is perhaps orthogonal to the original question, but one could wrap the iterator in a function that returns the iterator.

def get_iter():
    return iterator

To reset the iterator just call the function again. This is of course trivial if the function when the said function takes no arguments.

In the case that the function requires some arguments, use functools.partial to create a closure that can be passed instead of the original iterator.

def get_iter(arg1, arg2):
   return iterator
from functools import partial
iter_clos = partial(get_iter, a1, a2)

This seems to avoid the caching that tee (n copies) or list (1 copy) would need to do

score 4 · Answer 7 · answered Dec 05 '17 at 02:34

For small files, you may consider using more_itertools.seekable - a third-party tool that offers resetting iterables.

Demo

import csv

import more_itertools as mit


filename = "data/iris.csv"
with open(filename, "r") as f:
    reader = csv.DictReader(f)
    iterable = mit.seekable(reader)                    # 1
    print(next(iterable))                              # 2
    print(next(iterable))
    print(next(iterable))

    print("\nReset iterable\n--------------")
    iterable.seek(0)                                   # 3
    print(next(iterable))
    print(next(iterable))
    print(next(iterable))

Output

{'Sepal width': '3.5', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '5.1', 'Species': 'Iris-setosa'}
{'Sepal width': '3', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '4.9', 'Species': 'Iris-setosa'}
{'Sepal width': '3.2', 'Petal width': '0.2', 'Petal length': '1.3', 'Sepal length': '4.7', 'Species': 'Iris-setosa'}

Reset iterable
--------------
{'Sepal width': '3.5', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '5.1', 'Species': 'Iris-setosa'}
{'Sepal width': '3', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '4.9', 'Species': 'Iris-setosa'}
{'Sepal width': '3.2', 'Petal width': '0.2', 'Petal length': '1.3', 'Sepal length': '4.7', 'Species': 'Iris-setosa'}

Here a DictReader is wrapped in a seekable object (1) and advanced (2). The seek() method is used to reset/rewind the iterator to the 0th position (3).

Note: memory consumption grows with iteration, so be wary applying this tool to large files, as indicated in the docs.

score 3 · Answer 8 · edited Sep 16 '19 at 00:32

3

One possible option is to use itertools.cycle(), which will allow you to iterate indefinitely without any trick like .seek(0).

iterDic = itertools.cycle(csv.DictReader(open('file.csv')))

edited Sep 16 '19 at 00:32

MarredCheese

17,541
8
92
91

answered Jan 12 '19 at 09:26

Greg H

31
3

score 2 · Answer 9 · answered Jul 16 '10 at 16:22

While there is no iterator reset, the "itertools" module from python 2.6 (and later) has some utilities that can help there. One of then is the "tee" which can make multiple copies of an iterator, and cache the results of the one running ahead, so that these results are used on the copies. I will seve your purposes:

>>> def printiter(n):
...   for i in xrange(n):
...     print "iterating value %d" % i
...     yield i

>>> from itertools import tee
>>> a, b = tee(printiter(5), 2)
>>> list(a)
iterating value 0
iterating value 1
iterating value 2
iterating value 3
iterating value 4
[0, 1, 2, 3, 4]
>>> list(b)
[0, 1, 2, 3, 4]

nry · Answer 10 · 2020-04-13T17:14:28.977

Return a newly created iterator at the last iteration during the 'iter()' call

class ResetIter: 
  def __init__(self, num):
    self.num = num
    self.i = -1

  def __iter__(self):
    if self.i == self.num-1: # here, return the new object
      return self.__class__(self.num) 
    return self

  def __next__(self):
    if self.i == self.num-1:
      raise StopIteration

    if self.i <= self.num-1:
      self.i += 1
      return self.i


reset_iter = ResetRange(10)
for i in reset_iter:
  print(i, end=' ')
print()

for i in reset_iter:
  print(i, end=' ')
print()

for i in reset_iter:
  print(i, end=' ')

Output:

0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9

score 1 · Answer 11 · answered Sep 19 '13 at 14:17

For DictReader:

f = open(filename, "rb")
d = csv.DictReader(f, delimiter=",")

f.seek(0)
d.__init__(f, delimiter=",")

For DictWriter:

f = open(filename, "rb+")
d = csv.DictWriter(f, fieldnames=fields, delimiter=",")

f.seek(0)
f.truncate(0)
d.__init__(f, fieldnames=fields, delimiter=",")
d.writeheader()
f.flush()

score 1 · Answer 12 · answered May 20 '15 at 06:27

1

list(generator()) returns all remaining values for a generator and effectively resets it if it is not looped.

answered May 20 '15 at 06:27

Will

1,124
12
33

score 1 · Answer 13 · answered Jan 31 '18 at 19:18

Problem

I've had the same issue before. After analyzing my code, I realized that attempting to reset the iterator inside of loops slightly increases the time complexity and it also makes the code a bit ugly.

Solution

Open the file and save the rows to a variable in memory.

# initialize list of rows
rows = []

# open the file and temporarily name it as 'my_file'
with open('myfile.csv', 'rb') as my_file:

    # set up the reader using the opened file
    myfilereader = csv.DictReader(my_file)

    # loop through each row of the reader
    for row in myfilereader:
        # add the row to the list of rows
        rows.append(row)

Now you can loop through rows anywhere in your scope without dealing with an iterator.

score 1 · Answer 14 · answered Oct 01 '19 at 17:14

I'm arriving at this same issue - while I like the tee() solution, I don't know how big my files are going to be and the memory warnings about consuming one first before the other are putting me off adopting that method.

Instead, I'm creating a pair of iterators using iter() statements, and using the first for my initial run-through, before switching to the second one for the final run.

So, in the case of a dict-reader, if the reader is defined using:

d = csv.DictReader(f, delimiter=",")

I can create a pair of iterators from this "specification" - using:

d1, d2 = iter(d), iter(d)

I can then run my 1st-pass code against d1, safe in the knowledge that the second iterator d2 has been defined from the same root specification.

I've not tested this exhaustively, but it appears to work with dummy data.

score 0 · Answer 15 · answered Jul 16 '10 at 15:04

0

Only if the underlying type provides a mechanism for doing so (e.g. fp.seek(0)).

answered Jul 16 '10 at 15:04

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

score 0 · Answer 16 · answered Mar 15 '21 at 21:25

The simplest solution possible: use deepcopy

from copy import deepcopy
iterator = your_iterator

# Start iteration
iterator_altered = deepcopy(iterator)
for _ in range(2):
    a = next(iter(iterator_altered))

# Your iterator is still unaltered.

I think this is the simples approach.

Can iterators be reset in Python?

16 Answers16

Problem

Solution

Linked