3

I've written a Python3 script to extract strings of C/C++/Java source codepoints/surrogate pairs for emoji characters (\ud83d\ude00 for , for example) from a text file.

I also have a dictionary in this script mapping emoji to their descriptions ("" => "grinning face"). How can I convert the surrogate pairs (\ud83d\ude00, string literal) to their emoji counterparts in order to use them as keys to access the corresponding emojis' descriptions in the dictionary?

For some additional information, I'm extracting the strings in such a way that when I run print(extracted_string), the console output is \ud83d\ude00. When I attempt to assign the value at the emoji key to a variable, I get back an error:

description = dictionary[extracted_string]
KeyError: '\\ud83d\\ude00'
sidd flinch
  • 99
  • 1
  • 7

2 Answers2

2

This is the same as JSON's encoding, too.

>>> import json
>>> json.loads('"\\ud83d\\ude00"')
''
Josh Lee
  • 171,072
  • 38
  • 269
  • 275
  • For anyone else looking for this answer - the string *must* be formatted as above, with quotes around the string literal of the surrogate pairs, so if the variable `emoji` is assigned the string literal value `\ud83d\ude00`, it'd be necessary to set `emoji = '"' + emoji + '"'. Thank you for the answer, Josh! – sidd flinch Feb 22 '18 at 14:37
0

It took some digging and a whole bunch of encoding/decoding, but I've found something that works:

extracted_string = '\\ud83d\\ude00' #String literal as read from file
emoji = extracted_string.encode().decode('unicode-escape').encode('utf-16', 'surrogatepass').decode('utf-16')
print(emoji)

Output:

Which is slightly modified from @falestru's answer here: https://stackoverflow.com/a/26311382/1082235

sidd flinch
  • 99
  • 1
  • 7