Pandas convert_object(convert_numeric=True) not producing np.nan for full series of non-numeric values

Question

Tried on Pandas v0.12 from ActiveState (Python 2.7.2) and Pandas v0.14 from Anaconda (Python 2.7.8).

When a DataFrame's column is full of values that can't be converted to numeric values, none of the column values are converted to NAN. When 1 or more values can be converted to numeric values, all of the non-numeric values are properly converted to NAN.

import pandas as pd
pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","c"]}).convert_objects(convert_numeric=True)

  c1 c2
0   1  a
1   2  b
2   3  c

pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","4"]}).convert_objects(convert_numeric=True)

   c1  c2
0   1 NaN
1   2 NaN
2   3   4

I'm reading user supplied data so I'm converting to numeric and then handling the NAN values appropriately.

The only way I can prevent this is by adding a dummy row full of floats (0.0), performing the conversion and then deleting the row.

I can't use ".astype(float)" since it will raise an exception.

How can I ensure all non-numeric values are converted to NAN?

Does anyone know if the behavior is also in Pandas v0.15 or Python 3+?

score 1 · Answer 1 · edited May 23 '17 at 12:29

I don't think there's a neat way to do this (perhaps there should be a force argument to astype?).

In a similar vein to another question you could use applymap:

def to_float_or_nan(x):
    try:
        return float(x)
    except ValueError:
        return float('nan')

df.applymap(to_float_or_nan)

Which results from your inputs:

In [11]: pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","4"]}).applymap(to_float_or_nan)
Out[11]:
   c1  c2
0   1 NaN
1   2 NaN
2   3   4

In [12]: pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","c"]}).applymap(to_float_or_nan)
Out[12]:
   c1  c2
0   1 NaN
1   2 NaN
2   3 NaN

score 1 · Accepted Answer · answered Dec 21 '14 at 07:49

1

Set 'nan' where value is not a number

>>> import pandas as pd

>>> df1 = pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","c"]})
>>> df2 = pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","4"]})

>>> M = lambda x: x.isdigit()==True

>>> df1[~df1.applymap(M)]='nan'
>>> df2[~df2.applymap(M)]='nan'

>>> df1
  c1   c2
0  1  nan
1  2  nan
2  3  nan

>>> df2
  c1   c2
0  1  nan
1  2  nan
2  3    4

Hope, this will help

answered Dec 21 '14 at 07:49

Shahriar

13,460
8
78
95

Thanks. The objects_objects behavior seems like a gotcha. – Daniel Fudge Dec 21 '14 at 14:00
You might need `M = lambda x: str(x).isdigit()==True` because numerical like `int` types don't support the operation `isdigit()`. – kpie Sep 29 '16 at 05:12

Pandas convert_object(convert_numeric=True) not producing np.nan for full series of non-numeric values

2 Answers2