1

The unicode object u"ÿ" is given in Python. How can I convert it to the corresponding unicode escape syntax "\\u00FF"? Couldn't get unicode-escape to work here.

Edit: In my case a string object is given with the content r"\u00FF". On the other side I have a unicode object (from above) and I need to make a string comparison to check if they are equal. I need the unicode escape syntax as a string object from the unicode character from above to do that.

HelloWorld
  • 2,392
  • 3
  • 31
  • 68

2 Answers2

1
>>> u"ÿ".encode('raw-unicode-escape')
'\xff'
Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • 1
    Unfortunately this returns only the utf-8 encoded string, not the equivalent unicode escape syntax. – HelloWorld Aug 20 '15 at 10:13
  • 1
    Please consider editing your post to add more explanation about what your code does and why it will solve the problem. An answer that mostly just contains code (even if it's working) usually wont help the OP to understand their problem. – SuperBiasedMan Aug 20 '15 at 10:29
  • 1
    @HelloWorld: that's not true. In a Unicode string, `\xNN` refers to Unicode character U+0000NN. `u'\xFF'` and `u'\u00FF'` are exactly the same value. – bobince Aug 20 '15 at 17:09
  • @HelloWorld Oh I actually meant Daniel, your posts were explained well. – SuperBiasedMan Aug 20 '15 at 20:51
  • Thats correct, a unicode object comparision is equal but your encode gives back a string and this comparision fails then. `r"\xFF"` is unequal to `r"\u00FF"` – HelloWorld Aug 20 '15 at 20:51
1
r"\u%04X" % ord(u"ÿ")

This did the trick for me. It returns a string object ('\\u00FF') which I can use to make a string compare. It fails for unicode characters above U+FFFF but this is not necessary in my case.

HelloWorld
  • 2,392
  • 3
  • 31
  • 68
  • FYI: use this to convert an entire string `r"\u%04X"*len(string)) % ord(string)` – HelloWorld Aug 20 '15 at 10:22
  • This is fine for U+0000FF, but it won't generate usable output for characters above U+00FFFF. – bobince Aug 20 '15 at 17:08
  • You mean it's fine for U+00FF but it does not generate usable output for characters abive `U+FFFF`. But this was not needed in my case. But you are right, this information was missing. – HelloWorld Aug 20 '15 at 20:54