2

I open my file thus:

with open(sourceFileName, 'r', encoding='ISO-8859-1') as sourceFile:

but, when I

previousLine = linecache.getline(sourceFileName, i - 1)

I get an exception

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 169: 
invalid start byte

This is because (I think) linecache.getline returns a str() (which does not have a decode() method).

My script must be able to support unicode, so I can't simply convert the input file to UTF-8.

Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551

1 Answers1

4

linecache takes a filename, not a file object, as your usage shows. It has no provision for an encoding. Also from the documentation:

This is used by the traceback module to retrieve source lines for inclusion in the formatted traceback.

This implies that it is mainly used for Python source code. As it turns out, if the file has a Python source file encoding comment, it works:

input.txt

# coding: iso-8859-1
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ
[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»
¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

test.py

import linecache
print(linecache.getline('input.txt', 3))

Output

[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»

So linecache probably isn't the solution to your issue. Instead, open the file as you've shown and perhaps cache the lines yourself:

with open('x.txt',encoding='iso-8859-1') as f:
    lines = f.readlines()
print(lines[2])

You could also append lines to a list as they are read if you don't want to read the whole file, similar to linecache.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • 1
    `linecache` uses [`tokenize.open()` to open a file unless the filename is a module name and the PEP 302 loader has `get_source()` method. It defaults to utf-8 if there is no encoding declaration](https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding). – jfs Feb 26 '15 at 14:43