I am working on python 2-3 compatibility. When working with str and byte types, there is an issue I am coming across. Here is an example
# python 2
x = b"%r" % u'hello' # this returns "u'hello'"
# python 3
x = b"%r" % u'hello' # this returns b"'hello'"
Notice how the extra unicode u
appears in the final representation of x in python 2? I need to make my code return the same value in python3 and python2. My code can take in str, bytes, or unicode values.
I can coerce the python 3 value to the python 2 value by doing
# note: six.text_type comes from the six compatibility library. Basically checks to see if something is unicode in py2 and py3.
new_data = b"%r" % original_input
if isinstance(original_input, six.text_type) and not new_data.startswith(b"u'"):
new_data = b"u%s"
This makes the u'hello'
case work correct but messes up the 'hello'
case.
This is what happens:
# python 2
x = b"%r" % 'hello' # this returns "'hello'"
# python 3
x = b"%r" % 'hello' # this returns b"'hello'"
The problem is that in python 3 u'hello'
is the same as 'hello'
, So if I include my code above, the result for both u'hello
and 'hello'
end up returning the same result as u'hello
in python 3.
So I need some kind of way to tell if a python 3 input string explicitly has specified the u
in front of the string, and only execute my code above if that case is satisfied.