Equivalent to unicode() function that works with both Python 2.7 and 3.x?

Question

I'm trying to adapt some old code to make it usable with both Python 2 and 3. I'm using the six package for this task.

If I have u'abc' in 2.7, I can use the six.u() function and replace it with six.u('abc') to make it work in both 2.7 and 3.x.

How do I do something similar for:

unicode(value, errors='ignore', encoding='utf-8')

There is no unicode function in 3.x and I can't just replace it with str because that will change the meaning in 2.7.

if isinstance(value, basestring): # do something

There is no basestring in 3.x and again I can't just replace it with str without changing the meaning.

Of course, I can use the py2/3 checks with six.PY2 or six.PY3 to run one of two versions but is there a better way?

For the latter, see http://stackoverflow.com/q/11301138/3001761 — jonrsharpe, Jan 10 '17 at 23:50
I think that it's difficult to answer this question generally ... In principle, you'd use `unicode` in python2.x to coerce something (presumably a `str`) into `unicode`. On python3.x, the analogy would be coercing _bytes_ into `str` -- And for that, you usually just `.decode` the bytes I think... — mgilson, Jan 10 '17 at 23:52

SethMMorton · Answer 1 · 2017-01-11T17:49:17.053

To answer the second part of the question, you can replace if isinstance(value, basestring): with six.string_types:

import six
if isinstance(value, six.string_types):
    pass

To answer the first part, I would first recommend putting this at the top of your code:

from __future__ import unicode_literals

This will make all your Python2 str literals become unicode which will be a big first step in compatibility.

Second, if you really need some sort of compatibility conversion function, try this:

def py23_str(value):
    try:  # Python 2
        return unicode(value, errors='ignore', encoding='utf-8')
    except NameError:  # Python 3
        try:
            return str(value, errors='ignore', encoding='utf-8')
        except TypeError:  # Wasn't a bytes object, no need to decode
            return str(value)

I will say that I have written a few Python2/3 compatible libraries, and I have never needed to do this. Adding from __future__ import unicode_literals at the top of the code and calling .decode on bytes (or str in Python2) objects when they are created (i.e. reading from file in 'rb' mode) is all that I have needed so far.

I understand what you're saying but I don't have control over what value can be. Suppose `value = 'xyz'`. In 2.7, `unicode(value, errors='ignore', encoding='utf-8')` will work just fine but in 3.x, the same thing with `str(value, errors='ignore', encoding='utf-8')` will produce `TypeError: decoding str is not supported`. — user2602740, Jan 11 '17 at 06:25
Have you tried `from __future__ import unicode_literals`? That may help make it so you do not need to do this explicit decoding. — SethMMorton, Jan 11 '17 at 07:02

Equivalent to unicode() function that works with both Python 2.7 and 3.x?

1 Answers1