0

According to the Unicode specification these two strings are supposed to be equal because of unicode equivalence ( https://en.wikipedia.org/wiki/Unicode_equivalence#Errors_due_to_normalization_differences )

s1 = "\u006E\u0303"
s2 = "\u00F1"

But s1 == s1 returns False. I can't find on the Python webSite anything that states what == means for unicode string in Python?

For instance, this definition is clear for Swift ( https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html ). I can't see anything similar for Python.

InsideLoop
  • 6,063
  • 2
  • 28
  • 55
  • `==` for Unicode strings compares the codepoints (*scalars* in Swift) in the strings, nothing else. There is no support for extended grapheme clusters (*characters*) like Swift has. – Martijn Pieters May 23 '17 at 10:00
  • Equivalence and equality are different things. – Stefan Pochmann May 23 '17 at 10:02
  • Martijn: Is there anything in the documentation that states that? Stefan: I agree that equality is ill-defined unicode strings, but the wikipedia page states that "Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other". – InsideLoop May 23 '17 at 10:05
  • This might explain it a bit, at least it says "... they may not compare equal". https://docs.python.org/3.6/library/unicodedata.html#unicodedata.normalize – Stefan Pochmann May 23 '17 at 10:10
  • Stefan: thanks. I am just very puzzled now to what regular expression do. I still believe that the Python documentation is very fuzzy here. Thanks for your help and comments. – InsideLoop May 23 '17 at 10:13

0 Answers0