0

I have a data frame like this:

a = pd.DataFrame({'foo':[1,2,3,'str']})

    foo
0   1
1   2
2   3
3   str

I want to set the data type to int64:

a['foo'].astype('int32')

but I got an error message:

ValueError: invalid literal for int() with base 10: 'str'

How to set unexpected data type to NA. In my case, I'd like to return data frame like the following:

    foo
0   1
1   2
2   3
3   NA
freefrog
  • 685
  • 1
  • 8
  • 15
  • 1
    Possible duplicate of [How to replace all non-numeric entries with NaN in a pandas dataframe?](https://stackoverflow.com/questions/41938549/how-to-replace-all-non-numeric-entries-with-nan-in-a-pandas-dataframe) – Georgy Mar 13 '18 at 12:33
  • You cannot have `dtype 'int32'` and `NA` values at the same time in a `Series`. – Stop harming Monica Mar 13 '18 at 12:43

2 Answers2

4

The best is convert all values to floats, because NaNs are float by to_numeric with parameter errors='coerce':

df = pd.to_numeric(df['foo'], errors='coerce')
print (df)
0    1.0
1    2.0
2    3.0
3    NaN
Name: foo, dtype: float64

But if really need integers with floats, is possible this hack:

df = df['foo'].where(df['foo'].apply(lambda x: isinstance(x, int)))
print (df)
0      1
1      2
2      3
3    NaN
Name: foo, dtype: object

print (df.apply(type))
0      <class 'int'>
1      <class 'int'>
2      <class 'int'>
3    <class 'float'>
Name: foo, dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Or using isalpha

a.foo.mask(a.foo.str.isalpha().notnull())
Out[331]: 
0      1
1      2
2      3
3    NaN
Name: foo, dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234