UnicodeEncodeError while opening files in functions in Python 3.

Question

I'm trying to display contents of text files using functions but am getting an error I don't know what to do about. I'm using Python 3.2.

Code:

from sys import argv
from os.path import exists

script, input_file = argv

def print_all(foo):
    print(foo.read())

def rewind(foo):
    print("rewinding...")
    foo.seek(0)

def print_line(n, foo):
    print(n, foo.readline())

temp = open(input_file)

print_all(temp)
rewind(temp)
print_line(1, temp)
print_line(2, temp)

Error report:

PS C:\Python32> python Sample1.py sample2.txt
Traceback (most recent call last):
  File "Sample1.py", line 18, in <module>
    print_all(temp)
  File "Sample1.py", line 7, in print_all
    print(foo.read())
  File "C:\Python32\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 1: character maps to <undefined>

The contents of the text file sample2.txt are English alphabets only.

It is not the reading that is the issue, but the *printing*. Your console cannot handle the codepoints you are trying to write to it. — Martijn Pieters, Mar 06 '15 at 13:23
Although there actually is a slight reading issue here. Just because you're only using the "English alphabet" in the file doesn't necessarily mean that it's using the optimal charset for it. — Ignacio Vazquez-Abrams, Mar 06 '15 at 13:24
@MartijnPieters- I'm not sure I completely understand what you mean. I've read and printed contents of text files using the Windows Powershell before, it is only this time when I'm using functions to do so am I getting an error. — Vishal Khanna, Mar 06 '15 at 13:27
@IgnacioVazquez-Abrams- I don't get an error if I display the text on it's own. Only when I use functions for read/write operations do I get the error. — Vishal Khanna, Mar 06 '15 at 13:29
@VishalKhanna: You are getting an **encoding** error; unicode data is being converted (encoded) to bytes, to match your console codec. Your console is configured to use the [437 codepage](http://en.wikipedia.org/wiki/Code_page_437) and the latin character [U+00FE LATIN SMALL LETTER THORN](http://codepoints.net/U+00FE) cannot be represented in that codepage. — Martijn Pieters, Mar 06 '15 at 13:41
@VishalKhanna: this all has nothing to do with functions. If you moved that `print()` call out of the function and used it directly, it would **still fail**. — Martijn Pieters, Mar 06 '15 at 13:42
@VishalKhanna; note that you opened the file without specifying a codec; whatever you opened was then decoded using the default encoding as configured for your platform. See the encoding information in the [`open()` function documentation](https://docs.python.org/3/library/functions.html#open); the [`locale.getpreferredencoding()` function](https://docs.python.org/3/library/locale.html#locale.getpreferredencoding) is called to determine what to decode the file as. The interpretation of your data may be incorrect right there. — Martijn Pieters, Mar 06 '15 at 13:44
@VishalKhanna: thinking about this some more, if the `\xfe` byte is the *very first byte* in the file, you could well have a UTF-16-encoded file, which always start with a *Byte Order Mark*, the bytes `\xfe\xff` or `\xff\xfe` depending on the actual byte order. The big-endian byte order mark thus would be wrongly decoded to the Latin-1 character U+00FE. — Martijn Pieters, Mar 06 '15 at 13:48

UnicodeEncodeError while opening files in functions in Python 3.

0 Answers0