1

I'm not sure why these characters are different since they appear to be the same visually. Are they different representations of the same character? or actually different characters? Is there a method to check their equality that would result in True?

>>> s = u'\u2022' 
>>> ss = '•'
>>> s == ss
False
>>> print u'\u2022' , '•'
• •
>>> ss = unicode(ss)
>>> ss == s
False
>>> repr(ss)
"u'\\xe2\\x80\\xa2'"
>>> repr(s)
"u'\\u2022'"
JacobIRR
  • 8,545
  • 8
  • 39
  • 68
  • 1
    Try `ss.decode('utf-8') == s` – cs95 Jul 26 '17 at 22:02
  • They are the same character `BULLET`, this issue has to do with Python's `unicode` type, I guess. If you don't want to face such issues, you can switch to Python 3. – ForceBru Jul 26 '17 at 22:03
  • 2
    Try the same thing in Python 3 and you will have your personal reason why you should use Python 3 instead of Python 2. – poke Jul 26 '17 at 22:11
  • There is, in fact, a way to get them to compare equal. But you shouldn't use it. https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python (Set the "default encoding" to UTF-8 instead of ASCII or Latin-1) – Josh Lee Jul 26 '17 at 22:29

1 Answers1

7

u"\u2022" (your s) is a Unicode string (type unicode) containing the bullet character.

"\xe2\x80\xa2" (your ss) is a byte string (type str) containing the three bytes used to encode the bullet character as UTF-8.

You can convert one to the other using str.decode and unicode.decode:

>>> s_encode = s.encode("UTF-8")
>>> s_encode == ss
True

>>> ss_decode = ss.decode("UTF-8")
>>> ss_decode == s
True