I have a csv file (see here) that contains meta data from posts of a public page in Facebook. I need to decode all the content like: \xc3\xa9
and \xf0\x9f\x91\xa9\xf0\x9f\x8f\xbb\xe2\x80\x8d\xf0\x9f\x92\xbc
The meta data "post message" is:
"b'Bom dia, genteee! Me disseram que esse emoji \xc3\xa9 a minha cara: \xf0\x9f\x91\xa9\xf0\x9f\x8f\xbb\xe2\x80\x8d\xf0\x9f\x92\xbc\nO que voc\xc3\xaas acham?'"
and its type is str object.
I need convert it to:
Bom dia, genteee! Me disseram que esse emoji é a minha cara: O que vocês acham?
How I do this? I need convert all csv.
edit 1: I tried
My_string = post_message.split("b'")[1].split("'")[0]
My_string.encode().decode('unicode_escape')
but the result it's different than I expected:
Bom dia, genteee! Me disseram que esse emoji é a minha cara: ð©ð»âð¼ O que vocês acham?
Solution:
As @Ben pointed out, my data is a string object that contains bytes, not bytes object. So used the @ShadowRanger solution (see his answer here). I did:
My_string = post_message[2:len(post_message)-1] #to remove "b'" from begining and "'" from end
My_string = My_string.encode('utf-8').decode('unicode_escape').encode('latin-1').decode('utf-8')
The result:
Bom dia, genteee! Me disseram que esse emoji é a minha cara: O que vocês acham?