0

Tried on Pandas v0.12 from ActiveState (Python 2.7.2) and Pandas v0.14 from Anaconda (Python 2.7.8).

When a DataFrame's column is full of values that can't be converted to numeric values, none of the column values are converted to NAN. When 1 or more values can be converted to numeric values, all of the non-numeric values are properly converted to NAN.

import pandas as pd
pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","c"]}).convert_objects(convert_numeric=True)

  c1 c2
0   1  a
1   2  b
2   3  c

pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","4"]}).convert_objects(convert_numeric=True)

   c1  c2
0   1 NaN
1   2 NaN
2   3   4

I'm reading user supplied data so I'm converting to numeric and then handling the NAN values appropriately.

The only way I can prevent this is by adding a dummy row full of floats (0.0), performing the conversion and then deleting the row.

I can't use ".astype(float)" since it will raise an exception.

How can I ensure all non-numeric values are converted to NAN?

Does anyone know if the behavior is also in Pandas v0.15 or Python 3+?

2 Answers2

1

I don't think there's a neat way to do this (perhaps there should be a force argument to astype?).

In a similar vein to another question you could use applymap:

def to_float_or_nan(x):
    try:
        return float(x)
    except ValueError:
        return float('nan')

df.applymap(to_float_or_nan)

Which results from your inputs:

In [11]: pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","4"]}).applymap(to_float_or_nan)
Out[11]:
   c1  c2
0   1 NaN
1   2 NaN
2   3   4

In [12]: pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","c"]}).applymap(to_float_or_nan)
Out[12]:
   c1  c2
0   1 NaN
1   2 NaN
2   3 NaN
Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
1

Set 'nan' where value is not a number

>>> import pandas as pd

>>> df1 = pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","c"]})
>>> df2 = pd.DataFrame({"c1":["1","2","3"], "c2":["a","b","4"]})

>>> M = lambda x: x.isdigit()==True

>>> df1[~df1.applymap(M)]='nan'
>>> df2[~df2.applymap(M)]='nan'

>>> df1
  c1   c2
0  1  nan
1  2  nan
2  3  nan

>>> df2
  c1   c2
0  1  nan
1  2  nan
2  3    4

Hope, this will help

Shahriar
  • 13,460
  • 8
  • 78
  • 95