I am parsing multiple worksheets of unicode data and creating a dictionary for specific cells in each sheet but I am having trouble decoding the unicode data. The small snippet of the code is below
for key in shtDict:
sht = wb[key]
for row in sht.iter_rows('A:A',row_offset = 1):
for cell in row:
if isinstance(cell.value,unicode):
if "INC" in cell.value:
shtDict[key] = cell.value
The output of this section is:
{'60071508': u'\ufeffReason: INC8595939', '60074426': u'\ufeffReason. Ref INC8610481', '60071539': u'\ufeffReason: INC8603621'}
I tried to properly decode the data based on u'\ufeff' in Python string, by changing the last line to:
shtDict[key] = cell.value.decode('utf-8-sig')
But I get the following error:
Traceback (most recent call last):
File "", line 55, in <module>
shtDict[key] = cell.value.decode('utf-8-sig')
File "C:\Python27\lib\encodings\utf_8_sig.py", line 22, in decode
(output, consumed) = codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)
Not sure what the issue is, I have also tried decoding with 'utf-16', but I get the same error. Can anyone help with this?