1

The unicode of thumbs up is escaped as \ud83d\udc4d and \U0001f44d in Javascript and Python separately (reference: http://www.charbase.com/1f44d-unicode-thumbs-up-sign)

Here's the scenario:

I receive as \ud83d\udc4d from the front-end, and now I want to print it out at the server-side, how can I achieve that?

>>> print('\ud83d\udc4d')
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 0: surrogates not allowed

>>> print('\U0001f44d')

To be more precise, I want to save the data into MongoDB by using pymongo

>>> from pymongo import MongoClient
>>> MongoClient().db.collection.insert({'thumbs_up': '\ud83d\udc4d'})
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 0: surrogates not allowed

However, everything works fine if I convert it into \U0001f44d

amigcamel
  • 1,879
  • 1
  • 22
  • 36
  • 1
    How are you scraping this data? Just treat it as *JSON*, if at all possible. Otherwise, see the duplicate. – Martijn Pieters Dec 05 '16 at 09:47
  • 1
    The escape you've listed from JavaScript is UTF-16; the Unicode consortium [publish the algorithm](http://www.unicode.org/faq/utf_bom.html#utf16-3) for getting from there to the code unit. – T.J. Crowder Dec 05 '16 at 09:49

0 Answers0