I was happy in my Python world knowing that I was doing everything in Unicode and encoding as UTF-8 when I needed to output something to a user. Then, one of my colleagues sent me the "The UTF-8 Everywhere' manifesto" (2012) and it confused me.
- The author of the article claims a number of times that UCS-2, the Unicode representation that Python uses is synonymous with UTF-16.
- He even goes as far as directly saying Python uses UTF-16 for internal string representation.
- The author also admits to being a Windows lover and developer and states that the way MS has handled character encodings over the years has led to that group being the most confused so maybe it is just his own confusion. I don't know...
Can somebody please explain what the state of UTF-16 vs Unicode is in Python? Are they synonymous and if not, in what way?