How to use linecache with unicode?

Question

I open my file thus:

with open(sourceFileName, 'r', encoding='ISO-8859-1') as sourceFile:

but, when I

previousLine = linecache.getline(sourceFileName, i - 1)

I get an exception

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 169: 
invalid start byte

This is because (I think) linecache.getline returns a str() (which does not have a decode() method).

My script must be able to support unicode, so I can't simply convert the input file to UTF-8.

score 4 · Accepted Answer · answered Feb 25 '15 at 16:52

linecache takes a filename, not a file object, as your usage shows. It has no provision for an encoding. Also from the documentation:

This is used by the traceback module to retrieve source lines for inclusion in the formatted traceback.

This implies that it is mainly used for Python source code. As it turns out, if the file has a Python source file encoding comment, it works:

input.txt

# coding: iso-8859-1
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ
[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»
¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

test.py

import linecache
print(linecache.getline('input.txt', 3))

Output

[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»

So linecache probably isn't the solution to your issue. Instead, open the file as you've shown and perhaps cache the lines yourself:

with open('x.txt',encoding='iso-8859-1') as f:
    lines = f.readlines()
print(lines[2])

You could also append lines to a list as they are read if you don't want to read the whole file, similar to linecache.

`linecache` uses [`tokenize.open()` to open a file unless the filename is a module name and the PEP 302 loader has `get_source()` method. It defaults to utf-8 if there is no encoding declaration](https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding). — jfs, Feb 26 '15 at 14:43

How to use linecache with unicode?

1 Answers1

input.txt

test.py

Output