How to remove special usual characters from a pandas dataframe using Python

Question

I have a file some crazy stuff in it. It looks like this:

I attempted to get rid of it using this:

df['firstname'] = map(lambda x: x.decode('utf-8','ignore'), df['firstname'])

But I wound up with this in my dataframe: <map object at 0x0000022141F637F0>

I got that example from another question and this seems to be the Python3 method for doing this but I'm not sure what I'm doing wrong.

Edit: For some odd reason someone thinks that this has something to do with getting a map to return a list. The central issue is getting rid of non UTF-8 characters. Whether or not I'm even doing that correctly has yet to be established.

As I understand it, I have to apply an operation to every character in a column of the dataframe. Is there another technique or is map the correct way and if it is, why am I getting the output I've indicated?

Edit2: For some reason, my machine wouldn't let me create an example. I can now. This is what i'm dealing with. All those weird characters need to go.

import pandas as pd

data = [['≡ƒªÄAle','╬æ╬╗╬¡╬╛╬▒╬╜╬┤╧ü╬▒'],['∩┐╜∩┐╜Grain','Girl≡ƒî╛'],['─É├┤╠â Vu╠â','├¬n Anh'],['Don','Johnson']]
df = pd.DataFrame(data,columns=['firstname','lastname'])

print(df)

Edit 3: I tired doing this using a reg ex and for some reason, it still didn't work.

df['firstname'] = df['firstname'].replace('[^a-zA-z\s]',' ')

This regex works FINE in another process, but here, it still leaves the ugly characters.

Edit 4: It turns out that it's image data that we're looking at.

Possible duplicate of [Getting a map() to return a list in Python 3.x](https://stackoverflow.com/questions/1303347/getting-a-map-to-return-a-list-in-python-3-x) — glibdud, Mar 12 '19 at 17:08
Using the top answer from that question yields: list(map(lambda x: x.decode('utf-8','ignore'), df['firstname'])) which throws: AttributeError: 'str' object has no attribute 'decode'. — Bob Wakefield, Mar 12 '19 at 17:38
If you're having trouble following another question, it might be helpful to link to the other question. The dupe candidate gets you past the problem of having a map object where you didn't expect it. You're now on to the next problem, which is difficult to help with without seeing an [MCVE](https://stackoverflow.com/help/mcve). — glibdud, Mar 12 '19 at 18:21
Earlier when I tried to do an example, my machine wouldn't let me copy and paste in the questionable characters. It will now. Hang on a bit and I'll work up an example. — Bob Wakefield, Mar 12 '19 at 18:56
I obviously do not know all the utf-8 characters, but some of them look like utf-8 to me.. For example: ╬. Are you sure you are asking the right question? Do you perhaps want to remove all non alphanumeric characters? — Koray Tugay, Mar 12 '19 at 19:19
Can you also please add the code to show how you are actually trying to remove them? — Koray Tugay, Mar 12 '19 at 19:20
@KorayTugay df['firstname'] = map(lambda x: x.decode('utf-8','ignore'), df['firstname']) that's the code that I'm trying to use to remove them. I may not be asking the right question. Let me update. — Bob Wakefield, Mar 12 '19 at 21:52

How to remove special usual characters from a pandas dataframe using Python

0 Answers0