Seeking from end of file throwing unsupported exception

Question

I have this code snippet and I'm trying to seek backwards from the end of file using python:

f=open('D:\SGStat.txt','a');
    f.seek(0,2)
    f.seek(-3,2)

This throws the following exception while running:

f.seek(-3,2)
io.UnsupportedOperation: can't do nonzero end-relative seeks

Am i missing something here?

Python 3 only supports text file seeks from the beginning of the file. If you want to get the last three lines of a file, you can use deque(f, 3) to iterate over just those lines. — Dane White, Feb 03 '14 at 20:44
You can no-longer seek to arbitrary positions in a text file *by design*. That's because encodings like UTF-8 have an unpredictable number of bytes per character. Seek() cannot blindly seek to a position in a file and expect to be at the beginning of a character. — Philip Couling, Feb 26 '20 at 13:08

score 66 · Accepted Answer · edited Jan 31 '21 at 06:21

From the documentation for Python 3.2 and up:

In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)).

This is because text files do not have a 1-to-1 correspondence between encoded bytes and the characters they represent, so seek can't tell where to jump to in the file to move by a certain number of characters.

If your program is okay with working in terms of raw bytes, you can change your program to read:

f = open('D:\SGStat.txt', 'ab')
f.seek(-3, 2)

Note the b in the mode string, for a binary file. (Also note the removal of the redundant f.seek(0, 2) call.)

However, you should be aware that adding the b flag when you are reading or writing text can have unintended consequences (with multibyte encoding for example), and in fact changes the type of data read or written.

Eric Lindsey · Answer 2 · 2022-06-25T01:32:05.133

59

The existing answers do answer the question, but provide no solution.

As pointed out in the comments, this answer is based on undefined behavior and does not handle UnicodeDecodeError, which you may encounter with UTF-8 files. It works fine with ASCII and other fixed-width encodings as long as you seek to the beginning of a character. Please see Philip's answer which includes a workaround and further comments discussing why seeking backwards in UTF-8 is a problem.

From readthedocs:

If the file is opened in text mode (without b), only offsets returned by tell() are legal. Use of other offsets causes undefined behavior.

This is supported by the documentation, which says that:

In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file [os.SEEK_SET] are allowed...

This means if you have this code from old Python:

f.seek(-1, 1)   # seek -1 from current position

it would look like this in Python 3:

f.seek(f.tell() - 1, os.SEEK_SET)   # os.SEEK_SET == 0

Solution

Putting this information together we can achieve the goal of the OP:

f.seek(0, os.SEEK_END)              # seek to end of file; f.seek(0, 2) is legal
f.seek(f.tell() - 3, os.SEEK_SET)   # go backwards 3 bytes

edited Jun 25 '22 at 01:32

answered Jul 02 '18 at 07:41

Eric Lindsey

954
8
19

Great answer. Finally, a solution that works without the side effects of using a binary mode – Anupam Jan 06 '20 at 05:45
3

**This does not work!** and causes a UnicodeDecodeError. Example: create a text file starting with `a£b`. Open the file and `f.seek(2, os.SEEK_SET)`. Then `print(f.read(1))`. That's because for text files f.tell() is an opaque number and "[the only valid offset values are those returned from the f.tell()](https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects)". f.tell() - 3 was never returned by f.tell() so there is no garentee whatsoever that it will leave the file object in a readable state. It could be right in the middle of a multibyte utf8 character. – Philip Couling Feb 26 '20 at 12:51
3

Ironically this answer even quotes a passage from the docks which says it is "undefined behavior". – Philip Couling Feb 26 '20 at 13:00
1

`f.tell() - 1` is not an "offset returned by `tell()`", and is not a valid seek offset. This answer doesn't work. – user2357112 Jan 31 '21 at 06:11

score 5 · Answer 3 · edited Jun 20 '20 at 09:12

Eric Lindsey's answer does not work because UTF-8 files can have more than one byte per character. Worse, for those of us who speak English as a first language and work with English only files, it might work just long enough to get out into production code and really break things.

The following answer is based on undefined behavior

... but it does work for now for UTF-8 in Python 3.7.

To seek backwards through a file in text mode, you can do so as long as you correctly handle the UnicodeDecodeError caused by seeking to a byte which is not the start of a UTF-8 Character. Since we are seeking backwards we can simply seek back an extra byte until we find the start of the character.

The result of f.tell() is still the byte position in the file for UTF-8 files, at-least for now. So an f.seek() to an invalid offset will raise a UnicodeDecodeError when you subsequently f.read() and this can be corrected by f.seek() again to a different offset. At least this works for now.

Eg, seeking to the beginning of a line (just after the \n):

pos = f.tell() - 1
if pos < 0:
    pos = 0
f.seek(pos, os.SEEK_SET)
while pos > 0:
    try:
        character = f.read(1)
        if character == '\n':
            break
    except UnicodeDecodeError:
        pass
    pos -= 1
    f.seek(pos, os.SEEK_SET)

Note that while adjusting seek positions until you hit a valid one will let you seek *somewhere* in the file, you typically won't know how many characters you just jumped. That happens to not be an issue for the example code in the answer, but it will be an issue if you try to do something like `f.seek(f.tell() - 1000, os.SEEK_SET)` to jump back 1000 characters. — user2357112, Jan 31 '21 at 06:27

score 0 · Answer 4 · edited Apr 13 '18 at 13:05

0

In order to use seek from current position and end you have to open the text file in binary mode. See this example where I have created a file "nums.txt" and have put "ABCDEFGHIJKLMNOPQRSTUVWXYZ" in the file. I read the letters of the string "PYTHON" from the file and display the same. See the code I've run in python 3.6 windows in anaconda 4.2

    >>> file=open('nums.txt','rb')
    >>> file.seek(15,0)
    15
    >>> file.read(1).decode('utf-8')
    'P'
    >>> file.seek(8,1)
    24
    >>> file.read(1).decode('utf-8')
    'Y'
    >>> file.seek(-7,2)
    19
    >>> file.read(1).decode('utf-8')
    'T'
    >>> file.seek(7,0)
    7
    >>> file.read(1).decode('utf-8')
    'H'
    >>> file.seek(6,1)
    14
    >>> file.read(1).decode('utf-8')
    'O'
    >>> file.seek(-2,1)
    13
    >>> file.read(1).decode('utf-8')
    'N'

edited Apr 13 '18 at 13:05

Kevin

74,910
12
133
166

answered Nov 24 '17 at 06:55

Vikas Thada

17
1

this isn't what's intended and opening file in bin format can have unintended consequences. – Coddy Jan 24 '20 at 20:45
1

@Coddy what unintended consequences? If you know the encoding (and you should know if you want to treat the file as text), then any fixed width encoding including ASCII, Latin-1 and both UTF-16 and UTF-32 encodings give you predictable offsets. UTF-8 only requires a bit of bit masking. – Martijn Pieters Jan 25 '20 at 10:09
1

UTF-8 is not ASCII. UTF-8 does not have only 1 byte per character for code points >= 128. This will only work if every character in the file has a Unicode codepoint <= 127. Simple examples that will break this code would be characters: `£`, `€`, `¥` – Philip Couling Feb 26 '20 at 14:39

Seeking from end of file throwing unsupported exception

4 Answers4

Solution

The following answer is based on undefined behavior

Linked

Related