How are these characters different?

Question

I'm not sure why these characters are different since they appear to be the same visually. Are they different representations of the same character? or actually different characters? Is there a method to check their equality that would result in True?

>>> s = u'\u2022' 
>>> ss = '•'
>>> s == ss
False
>>> print u'\u2022' , '•'
• •
>>> ss = unicode(ss)
>>> ss == s
False
>>> repr(ss)
"u'\\xe2\\x80\\xa2'"
>>> repr(s)
"u'\\u2022'"

They are the same character `BULLET`, this issue has to do with Python's `unicode` type, I guess. If you don't want to face such issues, you can switch to Python 3. — ForceBru, Jul 26 '17 at 22:03
Try the same thing in Python 3 and you will have your personal reason why you should use Python 3 instead of Python 2. — poke, Jul 26 '17 at 22:11
There is, in fact, a way to get them to compare equal. But you shouldn't use it. https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python (Set the "default encoding" to UTF-8 instead of ASCII or Latin-1) — Josh Lee, Jul 26 '17 at 22:29

score 7 · Accepted Answer · 2017-07-26T22:05:39.753

7

u"\u2022" (your s) is a Unicode string (type unicode) containing the bullet character.

"\xe2\x80\xa2" (your ss) is a byte string (type str) containing the three bytes used to encode the bullet character as UTF-8.

You can convert one to the other using str.decode and unicode.decode:

>>> s_encode = s.encode("UTF-8")
>>> s_encode == ss
True

>>> ss_decode = ss.decode("UTF-8")
>>> ss_decode == s
True

edited Jul 26 '17 at 22:05

answered Jul 26 '17 at 22:03

1

This can be seen at http://www.fileformat.info/info/unicode/char/2022/index.htm under encodings – Nick is tired Jul 26 '17 at 22:04

How are these characters different?

1 Answers1