11

Python's f.tell doesn't work as I expected when you iterate over a file with f.next():

>>> f=open(".bash_profile", "r")
>>> f.tell()
0
>>> f.next()
"alias rm='rm -i'\n"
>>> f.tell()
397
>>> f.next()
"alias cp='cp -i'\n"
>>> f.tell()
397
>>> f.next()
"alias mv='mv -i'\n"
>>> f.tell()
397

Looks like it gives you the position of the buffer rather than the position of what you just got with next().

I've previously used the seek/tell trick to rewind one line when iterating over a file with readline(). Is there a way to rewind one line when using next()?

Jonas
  • 121,568
  • 97
  • 310
  • 388
new name
  • 15,861
  • 19
  • 68
  • 114

3 Answers3

13

No. I would make an adapter that largely forwarded all calls, but kept a copy of the last line when you did next and then let you call a different method to make that line pop out again.

I would actually make the adapter be an adapter that could wrap any iterable instead of a wrapper for file because that sounds like it would be frequently useful in other contexts.

Alex's suggestion of using the itertools.tee adapter also works, but I think writing your own iterator adapter to handle this case in general would be cleaner.

Here is an example:

class rewindable_iterator(object):
    not_started = object()

    def __init__(self, iterator):
        self._iter = iter(iterator)
        self._use_save = False
        self._save = self.not_started

    def __iter__(self):
        return self

    def next(self):
        if self._use_save:
            self._use_save = False
        else:
            self._save = self._iter.next()
        return self._save

    def backup(self):
        if self._use_save:
            raise RuntimeError("Tried to backup more than one step.")
        elif self._save is self.not_started:
            raise RuntimeError("Can't backup past the beginning.")
        self._use_save = True


fiter = rewindable_iterator(file('file.txt', 'r'))
for line in fiter:
    result = process_line(line)
    if result is DoOver:
        fiter.backup()

This wouldn't be too hard to extend into something that allowed you to backup by more than just one value.

lambacck
  • 9,768
  • 3
  • 34
  • 46
Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • This is the best solution for me. I already had something like a wrapper so it was easy to modify it this way. – new name Aug 21 '10 at 22:08
  • 1
    Update for python3: use `__next__` in place of next and this example will work out. See http://getpython3.com/diveintopython3/porting-code-to-python-3-with-2to3.html#next – Kevin Lee Nov 15 '12 at 08:47
5

itertools.tee is probably the least-bad approach -- you can't "defeat" the buffering done by iterating on the file (nor would you want to: the performance effects would be terrible), so keeping two iterators, one "one step behind" the other, seems the soundest solution to me.

import itertools as it

with open('a.txt') as f:
  f1, f2 = it.tee(f)
  f2 = it.chain([None], f2)
  for thisline, prevline in it.izip(f1, f2):
    ...
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
1

Python's file iterator does a lot of buffering, thereby advancing the position in the file far ahead of your iteration. If you want to use file.tell() you must do it "the old way":

with open(filename) as fileob:
  line = fileob.readline()
  while line:
    print fileob.tell()
    line = fileob.readline()
Lesmana
  • 25,663
  • 9
  • 82
  • 87