Good to know:
You are talking about two type of strings, byte string and unicode string. Each have a method to convert it to the other type of string. Unicode strings have a .encode() method that produces bytes, and byte strings have a .decode() method that produces unicode. It means:
unicode.enocde() ----> bytes
and
bytes.decode() -----> unicode
and UTF-8 is easily the most popular encoding for storage and transmission of Unicode. It uses a variable number of bytes for each code point. The higher the code point value, the more bytes it needs in UTF-8.
Get to the point:
If you redefine your string to two Byte strings and unicode strings, as follwos:
a_byte = b'Ope\xcc\x81rations'
a_unicode = u'Ope\xcc\x81rations'
and
b_byte = b'Op\xc3\xa9rations'
b_unicode = u'Op\xc3\xa9rations'
you w'll see:
print 'a_byte lenght is: ', len(a_byte.decode("utf-8"))
#print 'a_unicode lenght is: ',len(a_unicode.encode("utf-8"))
print 'b_byte lenght is: ',len(b_byte.decode("utf-8"))
#print 'b_unicode lenght is: ', len(b_unicode.encode("utf-8"))
output:
a_byte lenght is: 11
b_byte lenght is: 10
So you see they are not the same.
My solution:
If You don't want to be confused, then you can use repr(), and while print a_byte, b_byte printes Opérations
as output, but:
print repr(a_byte),repr(b_byte)
will return:
'Ope\xcc\x81rations','Op\xc3\xa9rations'
You can also normalize the unicode before comparison as @Daniel's answer, as follows:
from unicodedata import normalize
from functools import partial
a_byte = 'Opérations'
norm = partial(normalize, 'NFC')
your_string = norm(a_byte.decode('utf8'))