1

I am using Python to extract data from an MSSQL database, using an ODBC connection. I am then trying to put the extracted data into an Excel file, using xlwt.

However this generates the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 20: ordinal not in range(128)

I have run the script to just print the data and established that the offending character in the database is an O with a slash through it. On the python print it shows as "\xd8".

The worksheet encoding for xlwt is set as UTF-8.

Is there any way to have this come straight through into Excel?

Edit

Full error message below:

C:\>python dbtest1.py
Traceback (most recent call last):
  File "dbtest1.py", line 24, in <module>
    ws.write(i,j,item)
  File "build\bdist.win32\egg\xlwt\Worksheet.py", line 1032, in write
  File "build\bdist.win32\egg\xlwt\Row.py", line 240, in write
  File "build\bdist.win32\egg\xlwt\Workbook.py", line 309, in add_str
  File "build\bdist.win32\egg\xlwt\BIFFRecords.py", line 25, in add_str
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode 
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd8 in position 20: invalid
continuation byte
RowerAl
  • 51
  • 1
  • 8
  • What you call the worksheet encoding is the encoding used by xlwt to DEcode incoming str objects. The default is `ascii`. Your error message is inconsistent with "is set as UTF-8". Please edit your question to show the traceback that is associated with the error message. – John Machin Jul 22 '12 at 22:54
  • So the error message changed. Perhaps you were mistaken about what encoding you were using initially. In any case: you don't have latin1. Use cp1252 or similar, depending on your locale. Even better: persuade pyODBC to provide all text output as Python `unicode` objects. – John Machin Jul 24 '12 at 12:23
  • Yes - the top one was leaving it blank (so default ascii) and the second one was using UTF-8, I must have got a bit muddled somewhere. Thanks for your help - I'll have a look at pyODBC/a different method. – RowerAl Jul 24 '12 at 12:41

2 Answers2

4

Setting the workbook encoding to 'latin-1' seems to have achieved the same:

wb = xlwt.Workbook(encoding='latin-1') 

(It was set at 'UTF-8' before)

The other answer didn't work in my case as there were other fields that were not strings.

RowerAl
  • 51
  • 1
  • 8
-1

The SQL extraction seems to be returning strings encoded using ascii. You can convert them to unicode with:

data = unicode(input_string, 'latin-1')

You can then put them into a spreadsheet with xlwt.

xorsyst
  • 7,897
  • 5
  • 39
  • 58