how to remove "\x80" from strings

Question

I have a CSV file opened with 'latin1' encoding. However, there seems to be a problem with reading emojis. I want to remove all the emojis. It shows as square box and when I change to list, it changes to "\x80". Is there any way I can remove this??

df = pd.read_csv(r"myfilepath", encoding='latin1')

$I have a CSV file opened with 'latin1' encoding. However, there seems to be a problem with reading emojis. I want to remove all the emojis. It shows as square box and when I change to list, it changes to "\x80". Is there any way I can remove this??$

"opened with 'latin1' encoding ... problem with reading emojis" The Latin1 encoding does not support emojis. If your file contains emojis, it's not Latin1 encoded. Do you know the appropriate encoding of your file, e.g. UTF-8? Why don't you use the correct encoding, but use Latin1 instead? — MisterMiyagi, Apr 16 '20 at 13:45
@MisterMiyagi This is the error message I get whenever I tried to open the file with UTF-8. <> — dkdlfls26, Apr 17 '20 at 01:36

score 0 · Accepted Answer · answered Apr 16 '20 at 13:31

0

Try ASCII conversion, although this is for deleting the Emojis:

l_data = [x.encode('ascii', 'ignore').decode('ascii') for x in l_data]

If you want to remove a particular character:

l_data = [x.replace('\x80', '') for x in l_data]

Answer motivated by this

answered Apr 16 '20 at 13:31

Cblopez

446
2
12

Thank you for your answer! But could you please tell me what 'l_data' and 'x' in your code indicate in my case?? – dkdlfls26 Apr 16 '20 at 13:40
l_data is the name of the list of CSV lines. You named it X_data I think, my bad, but the important thing is that identifies the list however its called. `x` is the reference for the variable used to identify each element inside the list. That syntax used there is called List Comprehension, you can have al look [here](https://www.pythonforbeginners.com/basics/list-comprehensions-in-python). It translates into "Create a list with all the elements of `l_data`, but before inserting replace '\x80' with empty string" – Cblopez Apr 16 '20 at 14:25
Thank you so much! Your code works and thanks again for a kind explanation! – dkdlfls26 Apr 17 '20 at 01:40
No problem! Would appreciate if you could validate de answer. Thanks – Cblopez Apr 17 '20 at 07:31

score 0 · Answer 2 · answered Apr 16 '20 at 13:37

0

try this

df = pd.read_csv(r"myfilepath", encoding='iso-8859-1')

see this link below

UnicodeEncodeError : 'charmap' codec can't encode character '\x80' in position 0 : character maps to <undefined>

answered Apr 16 '20 at 13:37

montessillo

1
1

1

ISO-8859-1 and latin-1 are the same, at least in Python – snakecharmerb Apr 16 '20 at 14:14

how to remove "\x80" from strings

2 Answers2