Has anyone been able to write out UTF-8 characters using python's xlwt?

Question

I'm trying to write data to an excel file that includes Japanese characters. I'm using codec.open() to get the data, and that seems to work fine, but I run into this error when I try to write the data:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-17: ordinal not in range(128)

I don't understand why the program would be insisting on using ascii here. When I created a new workbook object, I did so using

wb = xlwt.Workbook(encoding='utf-8')

and both the program file itself and the file it's reading in are saved as UTF-8.

Anybody have any ideas?

EDIT: Here's a link to the xlwt package. http://pypi.python.org/pypi/xlwt

They're mostly useless. You're supposed to use `unicode`s instead. — Ignacio Vazquez-Abrams, Aug 25 '11 at 02:34
xlwt is not part of the standard library. It would be a good idea to provide a link — Karl Knechtel, Aug 25 '11 at 02:43
@Ignacio, I think you have that backwards. Unicode is a concept that doesn't specify a physical representation; you *must* encode it into something when you write it out. UTF-8 is a very good choice for that. — Mark Ransom, Aug 25 '11 at 04:21
Thanks for exposing me to xlwt, I had never seen it before. Could have come in handy in the past. — Mark Ransom, Aug 25 '11 at 04:26
@Mark: Sure, but you never actually write them out yourself; you let an encoder handle that, and write out `unicode`s. — Ignacio Vazquez-Abrams, Aug 25 '11 at 07:57
Hmm. OK. I'm still kind of fuzzy on the exact implementation. Would it help if I posted my code? — StormShadow, Aug 31 '11 at 10:10

score 16 · Accepted Answer · answered Sep 01 '11 at 03:39

In an Excel 97-2003 XLS file, each piece of text is encoded in latin1 if that is possible, otherwise UTF-16LE, with a flag to show which. To do that, xlwt nees a unicode object. If the caller supplies a str object, xlwt will attempt to decode it using the encoding specified in the Workbook() call (default is ascii).

This works; try running the following short script and open the resultant file with Excel.

import xlwt
wb = xlwt.Workbook(encoding="UTF-8")
uc = u"".join(unichr(0x0410 + i) for i in xrange(32)) # some Cyrillic characters
u8 = uc.encode("UTF-8")
ws = wb.add_sheet("demo")
ws.write(0, 0, uc)
ws.write(1, 0, u8)
ws.write(2, 0, xlwt.Formula("A1=A2"))
ws.write(3, 0, "ASCII is a subset of UTF-8")
wb.save("xlwt_write_utf8.xls")

The fact that you are getting an encode error, not a decode error, indicates a possible problem in the file input part of your script. Please supply the shortest possible script that causes the error that you are getting. The script should contain something like print repr(your_utf8_text) immediately prior to the failing statement, so that we can see exactly what the text data is. Please include the full error message and the full traceback, and the contents (print repr(contents)) of your very short input file.

That works brilliantly! At least I know I have hope. But the data goes through so many transformations before it gets to xlwt that I don't think I can post just a small snippet. I can tell you that my problem lies, I think, in your third and fourth lines. You are creating unicode text from scratch, and I'm reading it in from a file. Also, I don't think that I've used .encode in my program. I will try that and get back to you. In any case, great answer. — StormShadow, Sep 05 '11 at 09:07
@StormShadow: The problem is very likely to be in the "so many transformations before it gets to xlwt". You need to establish where in your code it is failing. This is usually determined by inspecting the traceback that should be generated when you get the UnicodeEncodeError. If this doesn't help you, publish the traceback. You are probably correct when you say that you are not calling `encode` explicitly; that error can happen when you are mixing `unicode` and `str` objects and something internal thinks it needs to decode a `str` object and uses the default encoding (in this case, `ascii`). — John Machin, Sep 05 '11 at 21:32

score 2 · Answer 2 · edited May 23 '17 at 12:02

2

As suggested by this question, setting the encoding on the WorkBook

wb = xlwt.Workbook(encoding='latin-1')

should also resolve the issue (it worked for me).

edited May 23 '17 at 12:02

Community

1
1

answered Aug 01 '13 at 14:47

Troy

21,172
20
74
103

Has anyone been able to write out UTF-8 characters using python's xlwt?

2 Answers2

Linked

Related