0

so i have a df with a column that has various string values

col1

Hi
-Hi
+hi
=Hi

I would like to remove all of the non alpha numeric values in this column to this:

col1

Hi
Hi
hi
Hi

I know i can just do a str replace with those non alpha characters, but to future proof the script, I would like to use something like isalpha(). there might be different non alpha characters in the future.

jpp
  • 159,742
  • 34
  • 281
  • 339
skimchi1993
  • 189
  • 2
  • 9

2 Answers2

1

You can use a list comprehension:

df['col1'] = [''.join([i for i in x if i.isalpha()]) for x in df['col1']]

print(df)

  col1
0   Hi
1   Hi
2   hi
3   Hi

If you have NaN or float values, remove them first by converting them to empty string:

df.loc[pd.to_numeric(df['col1'], errors='coerce').notnull(), 'col1'] = ''
jpp
  • 159,742
  • 34
  • 281
  • 339
0

You can also use regular expressions:

df['col1'].str.findall(r'[a-zA-Z0-9]+').apply(lambda x: ''.join(x))

Output:

0  Hi
1  Hi
2  hi
3  Hi