Robust unicode conversion to allow printing

Question

I am comparing two lists of dictionaries for equivalence. The data is from two sources outside of my control. If any of the fields are different I print out the two values:

        if event[field_name] == other_event[field_name]:
            print field_name, u'OK,'
        else:
            print field_name, u':', event[field_name], other_event[field_name]

However, the data is international in nature and it seems somewhere along the line has become ascii coded, so that sometimes I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 11: ordinal not in range(128)

What is the best way to convert the data so that it can be printed without the error? Note that the data is not all text, e.g. it may be boolean or integer or even None, so need a solution that can handle non-strings in a similar way to print.

The platform this code is operating on has Python 2 (Python 2.7.10 to be precise), but it would be advantageous if the solution was also compatible with Python 3 as it may need to run in a Python 3 environment in the future.

I checked Handle wrongly encoded character in Python unicode string, but my problem seems to be different as I can output u'\xfc' fine at the interactive prompt:

>>> print u'Gl\xfcck'
Glück

Thanks

You should probably drop Python 2 and switch to Python 3 once and for all. — ForceBru, Mar 05 '18 at 13:02
@ForceBru not in my control unfortunately, but if you do have a python3 specific answer please post it, others without my restriction may have the same problem — Raffles, Mar 05 '18 at 13:40
@Raffles, sure: Python 3 has built-in Unicode support, so you'll never ever need to worry about Unicode and these ugly `u` prefixes and the distinctions between `unicode` and `str`, etc. So, just run your code with Python 3 and fix all the syntax errors it'll start throwing at you. — ForceBru, Mar 05 '18 at 13:43
@PeterWood thanks - although my problem seems to be slightly different one of the non-accepted answers to that problem did work. Much appreciated — Raffles, Mar 05 '18 at 14:02

score 0 · Answer 1 · answered Mar 05 '18 at 14:01

0

This worked:

import sys  # import sys package, if not already imported
reload(sys)
sys.setdefaultencoding('utf-8')

This was from Handle wrongly encoded character in Python unicode string but a couple of answers down from the accepted one.

answered Mar 05 '18 at 14:01

Raffles

1
1

Robust unicode conversion to allow printing

1 Answers1