Python 3 - How are the emojis and unicode handled and read in Python? A test

Question

I have some sentences with words and emojis and my goal is to convert the emojis in their description.

Example: " Hello!" will converted in "smiling_face_with_smiling_eyes Hello!"

Actually I am not at ease with encoding/decoding and I have encountered some issues. Thanks to another post here Converting emojis to unicode and viceversa I think I may have found the solution. Still, I don't understand what it is going on and the reasons why I should do this. I will appreciate some explanations.

I will show you two tests, the first one is the one that failed. May you explain why?

# -*- coding: UTF-8 -*
unicode = u"\U0001f600"
string = u"\U0001f600 Hello world"
print("SENT: "+string)

OUTPUT: SENT: Hello world

Test 1 (FAIL):

if string.find(unicode):
   print("after: "+string.replace(unicode,"grinning_face_with_sweat"))
else:
   print("not found : "+unicode)

OUTPUT: not found :

Test 2:

if string.find(unicode.encode('unicode-escape').decode('ASCII')):
   print(string.replace(unicode,"grinning_face_with_sweat"))
else:
   print("not found : "+unicode)

OUTPUT: grinning_face_with_sweat Hello world

To check whether a string is contained in another one, in Python you do `x in y`, ie. `if unicode in string:` in your code. It's shorter and easier to read. — lenz, Apr 20 '20 at 18:10

score 1 · Accepted Answer · answered Apr 20 '20 at 18:07

Since the text from unicode is at the beginning of string, string.find(unicode) returns 0. If not found, it returns -1. Your code should be:

if string.find(unicode) != -1:
   print("after: "+string.replace(unicode,"grinning_face_with_sweat"))
else:
   print("not found : "+unicode)

BTW, are you still using Python 2? I strongly suggest switching to Python 3. And if you're using Python 3, there's no need to precede strings with u, since all strings in Python 3 are Unicode.

Python 3 - How are the emojis and unicode handled and read in Python? A test

1 Answers1