How to convert surrogate pairs read from txt files back to emojis in python 3?

Question

I have a few txt files to read where there are string such as:

"Yes! Sardines in a can distancing! \uD83E\uDD23"

Problem is that : when I'm doing

"Yes! Sardines in a can distancing! \uD83E\uDD23".encode('utf-16','surrogatepass' ).decode('utf-16)

the unicode point is converted to emoji because python considers \UDD23 or \UD83E as two single characters individually.

output:

Yes! Sardines in a can distancing!

Also, when I see the length of the above string using the len() function, the output is 37.

However when I'm reading the same string from a text file python reads \UDD23 or \UD83E as separate characters i.e 12 characters in total, which I do not want because my encode().decode() function won't give the expected result. That is the unicode points would not be converted to emojis. I used the code below:

count=0
for item in enumerate(list(tweet_dict)):
    if item[0]==75:
        a=item[1]['text']
        print('Length of the string is: ',len(str(a)))
        print(a.encode('utf-16', 'surrogatepass').decode('utf-16'))

Output is:

Length of the string is:  47
Yes! Sardines in a can distancing! \uD83E\uDD23

doesn't match with my scenario as I cannot import any std library such as import unicodedata or ast — Aniruddha, Aug 24 '20 at 03:51
Why can't you import things like `unicodedata` and `ast`? They should come packaged along with your python installation, and are considered part of the 'standard library', so there shouldn't be anything stopping you, in particular. You shouldn't have to install anything. — Green Cloak Guy, Aug 24 '20 at 03:54
Alternatively, have you tried typing an actual emoji in the textfile and letting the computer save that however it wants, rather than literally typing `\uD83E\uDD23` in the textfile? — Green Cloak Guy, Aug 24 '20 at 03:55
The answers to [this question](https://stackoverflow.com/q/38147259/5320906) are also relevant (and another possible duplicate). — snakecharmerb, Aug 24 '20 at 07:03
Based on the answers I linked to, you can either do `json.load(open('myfile.txt'))` or if you don't want to import the json module, `open('myfile.txt', 'rb').read().decode('unicode-escape').encode('utf-16', 'surrogatepass').decode('utf-16')`. — snakecharmerb, Aug 24 '20 at 08:05

How to convert surrogate pairs read from txt files back to emojis in python 3?

0 Answers0