TL;DR
python2.6+ bytes
= python2.6+ str
= python3.x bytes
!= python3.x str
python2.6+ bytearray
= python3.x bytearray
python2.x unicode
= python3.x str
Long Answer
bytes
and str
have changed meaning in python since python 3.x.
First to answer your question shortly, in python 2.6 bytes(b"hi")
is an immutable array of bytes (8-bits or octets). So the type of each byte
is simply byte
, which is the same as str
in python 2.6+ (However, this is not the case in python 3.x)
bytearray(b"hi")
is again a mutable array of bytes. But when you ask its type, it's an int
, because python represents each element of bytearray
as an integer in range 0-255 (all possible values for an 8-bit integer). However, an element of bytes
array is represented as an ASCII value of that byte.
For example, consider in Python 2.6+
>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0] # python shows you an int value for the 8 bits 0110 1000
104
>>> bs[0] # python shows you an ASCII value for the 8 bits 0110 1000
'h'
>>> chr(barr[0]) # chr converts 104 to its corresponding ASCII value
'h'
>>> bs[0]==chr(barr[0]) # python compares ASCII value of 1st byte of bs and ASCII value of integer represented by first byte of barr
True
Now python 3.x is an entirely different story. As you might have suspected, it is weird why an str
literal would mean a byte
in python2.6+. Well this answer explains that
In Python 3.x, an str
is a Unicode text (which was previously just an array of bytes, note that Unicode and bytes are two completely different things). bytearray
is a mutable array of bytes while bytes
is an immutable array of bytes. They both have almost the same functions. Now if I run the above same code again in python 3.x, here is the result. In Python 3.x
>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0]
104
>>> bs[0]
104
>>> bs[0]==barr[0] # bytes and bytearray are same thing in python 3.x
True
bytes
and bytearray
are the same things in python 3.x, except for there mutability.
What happened to str
you might ask? str
in python 3 got converted to what unicode
was in python 2, and unicode
type was subsequently removed from python 3 as it was redundant.
I'd like to write code that will translate well into Python 3. So, is the situation the same in Python 3?
It depends on what you are trying to do. Are you dealing with bytes or are you dealing with ASCII representation of bytes?
If you are dealing with bytes, then my advice is to use bytearray
in Python 2, which is the same in python 3. But you loose immutability, if that matter to you.
If you are dealing with ASCII or text, then represent your string as u'hi'
in Python 2, which has the same meaning in python 3. 'u'
has special meaning in Python 2, which instructs python 2 to treat a string literal as unicode
type. 'u' in python 3 as no meaning, because all string literal in Python 3 are Unicode by default (which is confusingly called str
type in python 3, and unicode
type in python 2).