0

I have been using an column where i have to remove non numeric characters from the column , however i have tried but not working in my case.

Input data

column1

675@12
##256H8\
A--5647R
NaN
222674
98AB 789

Expected Output

column1

67512
2568
5647
NaN
222674
98789

Code i have been using :

df['column1'] = re.sub("[^0-9^.]", "", str(df['column1']))

But its not working , Please suggest the possible way.

Manz
  • 593
  • 5
  • 23

1 Answers1

2

You can use pandas Series's vectorized counterpart of the re.sub method .str.replace to remove \D (match non numeric characters):

df.column1.str.replace('\D', '')

0     67512
1      2568
2      5647
3       NaN
4    222674
5     98789
Name: column1, dtype: object
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks for the Answer, Sir, its cleaning the rows but it is also removing all the rows where only Numerical values are present. – Manz Jun 02 '21 at 16:54
  • What do you mean ? `222674` is pure numeric but still kept ? – Psidom Jun 02 '21 at 17:06
  • In my case the rows with only numeric characters are been replaced by NaN – Manz Jun 02 '21 at 17:09
  • I think those are not numeric characters but numbers. Try convert the entire column to str first: `df.column1.astype(str).str.replace('\D', '')` – Psidom Jun 02 '21 at 17:12