Writing to excel string in encoding UTF-16

Question

I'm opening text file in encoding UTF-16 mode:

with open(file.txt, 'r', encoding="UTF-16") as infile:

Then I want to write to an excel file:

from csv import writer
excelFile = open("excelFile_1.csv", 'w', newline='') 
write = writer(excelFile, delimiter=',')
write.writerows([[input]])

where input is a term from the text file file.txt

I get the following error

UnicodeEncodeError: 'charmap' codec can't encode character '\xe9' in position 113: character maps to <undefined>

Using Python 3.2

score 3 · Accepted Answer · edited May 23 '17 at 12:05

3

You need to pick an output encoding for the CSV file as well:

excelFile = open("excelFile_1.csv", 'w', newline='', encoding='UTF16')

The default codec for your system cannot handle the codepoints you are reading from the input filename.

Opening this file in Excel may not work; do follow the procedure in this answer, picking the UTF16 codec, to ensure that Excel reads the file correctly.

You could also try using UTF-8, adding in a UTF-8 BOM to the start of the file:

excelFile = open("excelFile_1.csv", 'w', newline='', encoding='UTF8')
excelFile.write('\ufeff')  # Zero-width non-breaking space, the Byte Order Mark

It is mostly Microsoft software that uses a BOM in UTF-8 files, since UTF-8 only has one byte order to pick from, unlike UTF-16 and UTF-32, but it apparently makes Excel happy(er).

edited May 23 '17 at 12:05

Community

1
1

answered Aug 14 '13 at 21:34

Martijn Pieters

1,048,767
296
4,058
3,343

I tried the second option, work great with the regular open of excel, and I didn't need to add the "\ufeff". – Presen Aug 14 '13 at 22:25
@user1869297 it will work without the BOM until you have some actual Unicode non-ASCII characters in the file. And I know you know this Martijn, but the purpose of the BOM in this case is not to signify byte order, it's to mark the file as UTF-8 encoded instead of one of the ancient code page encodings that Microsoft still prefers. – Mark Ransom Aug 14 '13 at 22:29
@MarkRansom: Yes, I know, Microsoft has to support too many legacy codecs. Note that the OP *does* have codepoints in the Latin-1 range in the output, that's why they had errors in the first place. – Martijn Pieters Aug 14 '13 at 22:32

Writing to excel string in encoding UTF-16

1 Answers1

Linked