How to remove non numeric characters from an column

Question

I have been using an column where i have to remove non numeric characters from the column , however i have tried but not working in my case.

Input data

column1

675@12
##256H8\
A--5647R
NaN
222674
98AB 789

Expected Output

column1

67512
2568
5647
NaN
222674
98789

Code i have been using :

df['column1'] = re.sub("[^0-9^.]", "", str(df['column1']))

But its not working , Please suggest the possible way.

score 2 · Accepted Answer · answered Jun 02 '21 at 16:50

2

You can use pandas Series's vectorized counterpart of the re.sub method .str.replace to remove \D (match non numeric characters):

df.column1.str.replace('\D', '')

0     67512
1      2568
2      5647
3       NaN
4    222674
5     98789
Name: column1, dtype: object

answered Jun 02 '21 at 16:50

Psidom

Thanks for the Answer, Sir, its cleaning the rows but it is also removing all the rows where only Numerical values are present. – Manz Jun 02 '21 at 16:54
What do you mean ? `222674` is pure numeric but still kept ? – Psidom Jun 02 '21 at 17:06
In my case the rows with only numeric characters are been replaced by NaN – Manz Jun 02 '21 at 17:09
I think those are not numeric characters but numbers. Try convert the entire column to str first: `df.column1.astype(str).str.replace('\D', '')` – Psidom Jun 02 '21 at 17:12

1 Answers1