1

I want to replace all numerical values in a DataFrame column with NaN

Input

A       B       C
test    foo     xyz
hit     bar     10
hit     fish    90
hit     NaN     abc
test    val     20
test    val     90

Desired Output:

A       B       C
test    foo     xyz
hit     bar     NaN
hit     fish    NaN
hit     NaN     abc
test    val     NaN
test    val     NaN

I tried the following:

db_old.loc[db_old['Current Value'].istype(float), db_old['Current Value']] = np.nan

but returns:

AttributeError: 'Series' object has no attribute 'istype'

Any suggestions?

Thanks

Maeaex1
  • 703
  • 7
  • 36

1 Answers1

4

You can mask numeric values using to_numeric:

df['C'] = df['C'].mask(pd.to_numeric(df['C'], errors='coerce').notna())
df
      A     B    C
0  test   foo  xyz
1   hit   bar  NaN
2   hit  fish  NaN
3   hit   NaN  abc
4  test   val  NaN
5  test   val  NaN

to_numeric is the most general solution and should work regardless of whether you have a column of strings or mixed objects.


If it is a column of strings and you're only trying to retain strings of letters, str.isalpha may suffice:

df['C'] = df['C'].where(df['C'].str.isalpha())
df
      A     B    C
0  test   foo  xyz
1   hit   bar  NaN
2   hit  fish  NaN
3   hit   NaN  abc
4  test   val  NaN
5  test   val  NaN

Although this specifically keeps strings that do not have digits.


If you have a column of mixed objects, here is yet another solution using str.match (any str method with a na flag, really) with na=False:

df['C'] = ['xyz', 10, 90, 'abc', 20, 90]

df['C'] = df['C'].where(df['C'].str.match(r'\D+$', na=False))
df
      A     B    C
0  test   foo  xyz
1   hit   bar  NaN
2   hit  fish  NaN
3   hit   NaN  abc
4  test   val  NaN
5  test   val  NaN
cs95
  • 379,657
  • 97
  • 704
  • 746