Openpyxl Unicode decode error cannot remove \ufeff from cell value

Question

I am parsing multiple worksheets of unicode data and creating a dictionary for specific cells in each sheet but I am having trouble decoding the unicode data. The small snippet of the code is below

for key in shtDict:
    sht = wb[key] 
    for row in sht.iter_rows('A:A',row_offset = 1):
        for cell in row:
            if isinstance(cell.value,unicode):
                if "INC" in cell.value:
                    shtDict[key] = cell.value

The output of this section is:

{'60071508': u'\ufeffReason: INC8595939', '60074426': u'\ufeffReason. Ref INC8610481', '60071539': u'\ufeffReason: INC8603621'}

I tried to properly decode the data based on u'\ufeff' in Python string, by changing the last line to:

shtDict[key] = cell.value.decode('utf-8-sig')

But I get the following error:

Traceback (most recent call last):
  File "", line 55, in <module>
    shtDict[key] = cell.value.decode('utf-8-sig')
  File "C:\Python27\lib\encodings\utf_8_sig.py", line 22, in decode
    (output, consumed) = codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

Not sure what the issue is, I have also tried decoding with 'utf-16', but I get the same error. Can anyone help with this?

You use `decode()` to go from an encoded string to unicode. Hence, you don't need to try and decode anything that is already unicode. — Charlie Clark, Apr 26 '18 at 13:18

score 3 · Accepted Answer · answered Apr 26 '18 at 12:57

3

Just make it simpler: you can ignore BOF, so just ignore BOF characters.

shtDict[key] = cell.value.replace(u'\ufeff', '', 1)

Note: cell.value is already unicode type (you just checked it), so you cannot decode it again.

answered Apr 26 '18 at 12:57

Giacomo Catenazzi

8,519
2
24
32

Thanks for the help, and the note. Still trying to get a better grasp on Unicode encoding and decoding – mickNeill Apr 26 '18 at 13:05

Openpyxl Unicode decode error cannot remove \ufeff from cell value

1 Answers1