After a decode, I am expecting the 4 bytes of hex code to be replaced by a single \u entry. For instance, \xf0\x9f\x98\x8e is replaced with \u1F60E. Why doesn't decode combine the 4-byte sequences? More specifically, if I want to do a search for a specific emoji, I'd like to use the \u form.
row3 = tweet_table.loc[3, 'tweet']
row3
'#model i love u take with u all the time in ur\xc3\xb0\xc2\x9f\xc2\x93\xc2\xb1!!! \xc3\xb0\xc2\x9f\xc2\x98\xc2\x99\xc3\xb0\xc2\x9f\xc2\x98\xc2\x8e\xc3\xb0\xc2\x9f\xc2\x91\xc2\x84\xc3\xb0\xc2\x9f\xc2\x91\xc2\x85\xc3\xb0\xc2\x9f\xc2\x92\xc2\xa6\xc3\xb0\xc2\x9f\xc2\x92\xc2\xa6\xc3\xb0\xc2\x9f\xc2\x92\xc2\xa6'
print(row3)
#model i love u take with u all the time in urð±!!! ððððð¦ð¦ð¦
len(row3)
116
type(row3)
str
row3_u = row3.decode('utf-8',errors="replace")
row3_u
u'#model i love u take with u all the time in ur\xf0\x9f\x93\xb1!!! \xf0\x9f\x98\x99\xf0\x9f\x98\x8e\xf0\x9f\x91\x84\xf0\x9f\x91\x85\xf0\x9f\x92\xa6\xf0\x9f\x92\xa6\xf0\x9f\x92\xa6'
len(row3_u)
84
print(row3_u)
#model i love u take with u all the time in urð±!!! ððððð¦ð¦ð¦