4

Unlike in Python 2 (2.7.15) I'm seeing strange f.tell() behavior in Python 3 (3.6.5) when a binary file is opened for appending and reading. If n bytes are written when the current seek position is not at the end of the file, the following things seem to happen as expected:

  1. The file pointer is moved to the end of the file.
  2. The n bytes are written.
  3. n is added to the file pointer.

However, it appears that f.tell() does not notice step 1, so the value returned by f.tell() becomes offset by a constant negative amount compared to the actual file pointer. I see the same on both Windows and Linux.

Here's some Python 3 code demonstrating the issue:

import io

# Create file with some content
f = open('myfile', 'wb')
f.write(b'abc')
print(f.tell())                 # 3
f.close()

# Now reopen file in append+read mode and check that appending works
f = open('myfile', 'a+b')
print(f.tell())                 # 3
f.write(b'def')                 # (Append)
print(f.tell())                 # 6

# Now seek to start of file and append again -> tell() gets out of sync!
print(f.seek(0))                # 0
print(f.tell())                 # 0
f.write(b'ghi')                 # (Append)
print(f.tell())                 # 3!!! (expected 9)
f.write(b'jkl')                 # (Append)
print(f.tell())                 # 6!!! (expected 12)

# Try calling seek without moving file pointer -> tell() starts working again
print(f.seek(0, io.SEEK_CUR))   # 12 (correct)
print(f.tell())                 # 12 (correct)

# Read whole file to verify its contents
print(f.seek(0))                # 0
print(f.read())                 # b'abcdefghijkl' (correct)
f.close()

The Python 3 docs have warnings about using seek()/tell() on text files (see io.TextIOBase), and this one warning about append mode on some platforms (see open()):

[...] 'a' for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position).

But I'm using binary files, and the writes do seem to be appending to the end of the file regardless of the seek position, so my problem is different.

My question: Is this behavior documented (directly or indirectly) somewhere, or is it at least documented that the behavior is unspecified?

Edit:

Text files do not seem to have this problem (neither in Python 2 nor 3), so it is only binary files that don't work as expected.

The Python 3 docs (io.TextIOBase) state that tell() returns an "opaque" value for text files (i.e. it is not specified how the value represents the position), and since there is no mention of whether or not this also applies to binary files, one might speculate that my problem is related to this opacity. However, this can't be true, because even an opaque value must - when given to seek() - return the file pointer to where it was when tell() was called, and in the example above when tell() returns first 6 then 12 at the same file position (end of file), only seek(12) will actually move the file pointer to that position again. So the value 6 can't be explained by file pointer opacity.

Ovaflo
  • 624
  • 6
  • 13
  • When tell() is returning 3 but you expect it to return 9, what does a read do? – interfect Jun 25 '18 at 23:34
  • @interfect, if I do an f.read() after f.tell() has returned 3 (when 9 is expected), I get the expected empty bytes object `b''` indicating that the file pointer was in fact at the end of the file and not at position 3. A side effect of this f.read() is that f.tell() will start working again and report 9 for the same position. – Ovaflo Jun 26 '18 at 19:02

2 Answers2

0
  • When you call f.seek(t, offset), you change the file object's position to t + offset
  • written = f.write(data) advances the position by written bytes
  • f.tell() returns the current position in the file

So, there's no problem here:

f.seek(0) # position = 0
f.write(b'123') # position += len(b'123') => position = 3
f.tell() # return position, which is equal to 3

And the data is written right after the current position, so you aren't appending anything in this case, you're overwriting existing data. Or at least, you should be, but this behavior may be different, as said in your quote from the docs.

ForceBru
  • 43,482
  • 10
  • 63
  • 98
0

Like I replied at issue36411

I'm not sure it's a bug. When you write binary data to file (use BufferedIOBase by default). It actually writes the data to a buffer. That is why tell() gets out of sync. You can follow the instrument below. For instance, call flush() after writing to get the correct answer.

When writing to this object, data is normally placed into an internal buffer. The buffer will be written out to the underlying RawIOBase object under various conditions, including:

  1. when the buffer gets too small for all pending data;
  2. when flush() is called;
  3. when a seek() is requested (for BufferedRandom objects);
  4. when the BufferedWriter object is closed or destroyed.
  1. https://docs.python.org/3/library/io.html#io.BufferedWriter
Windsooon
  • 6,864
  • 4
  • 31
  • 50
  • Then why doesn't tell trigger a buffer flush? Or at least return the correct value which would be tell plus the length of the write buffer. – Dan D. May 24 '19 at 05:15
  • Buffering doesn't prevent tell() from working properly for other file modes, so why should it for this mode? And even this mode works in Python 2. – Ovaflo May 25 '19 at 09:15