0

There is no difference for the printing results, what is the usage of encoding and decoding for utf-8? And is it encode('utf8') or encode('utf-8')?

u ='abc'
print(u)
u=u.encode('utf-8')
print(u)
uu = u.decode('utf-8')
print(uu)
william007
  • 17,375
  • 25
  • 118
  • 194

2 Answers2

1

str.encode encodes the string (or unicode string) into a series of bytes. In Python 3 this is a bytearray, in Python 2 it's str again (confusingly). When you encode a unicode string, you are left with bytes, not unicode—remember that UTF-8 is not unicode, it's an encoding method that can turn unicode codepoints into bytes.

str.decode will decode the serialized byte stream with the selected codec, picking the proper unicode codepoints and giving you a unicode string.

So, what you're doing in Python 2 is: 'abc' > 'abc' > u'abc', and in Python 3 is: 'abc' > b'abc' > 'abc'. Try printing repr(u) or type(u) in addition to see what's changing where.

utf_8 might be the most canonical, but it doesn't really matter.

Nick T
  • 25,754
  • 12
  • 83
  • 121
  • Is it possible to look at the underlying byte array for the string in Python 2? – william007 Oct 06 '14 at 04:02
  • @william007 it would be implementation-specific; why--what are you trying to do? `bytearray` is a type that Python exposes to you, internally (in CPython) it's probably a `char` array. For a Python unicode string, it's stored as (again, in CPython) an array of [`Py_UNICODE` objects](https://docs.python.org/2/c-api/unicode.html#c.Py_UNICODE). They're fairly non-portable and meaningless outside of CPython. – Nick T Oct 06 '14 at 04:03
0

Usually Python will first try to decode it to unicode before it can encode it back to UTF-8.There are encording which doesnt have anything to do with the character sets which can be applied to 8 bit strings

For eg

data = u'\u00c3'            # Unicode data
 data = data.encode('utf8')
 print data

'\xc3\x83' //the output.

Please have a look through here and here.It would be helpful.

Community
  • 1
  • 1
Avinash Babu
  • 6,171
  • 3
  • 21
  • 26