3

I have problems in converting a column which contains both numbers of 2 digits in string format (type: str) and NaN (type: float64). I want to obtain a new column made this way: NaN where there was NaN and integer numbers where there was a number of 2 digits in string format. As an example: I want to obtain column Yearbirth2 from column YearBirth1 like this:

YearBirth1  #numbers here are formatted as strings: type(YearBirth1[0])=str
        34  # and NaN are floats: type(YearBirth1[2])=float64.
        76
       Nan
        09
       Nan
        91

YearBirth2  #numbers here are formatted as integers: type(YearBirth2[0])=int
        34  #NaN can remain floats as they were. 
        76
       Nan
         9
       Nan
        91

I have tried this:

csv['YearBirth2'] = (csv['YearBirth1']).astype(int)

And as I expected i got this error:

ValueError: cannot convert float NaN to integer

So I tried this:

csv['YearBirth2'] = (csv['YearBirth1']!=NaN).astype(int)

And got this error:

NameError: name 'NaN' is not defined

Finally I have tried this:

csv['YearBirth2'] = (csv['YearBirth1']!='NaN').astype(int)

NO error, but when I checked the column YearBirth2, this was the result:

YearBirth2:
         1
         1
         1
         1
         1
         1

Very bad.. I think the idea is right but there is a problem to make Python able to understand what I mean for NaN.. Or maybe the method I tried is wrong..

I also used pd.to_numeric() method, but this way i obtain floats, not integers..

Any help?! Thanks to everyone!

P.S: csv is the name of my DataFrame; Sorry if I am not so clear, I am on improving with English language!

mik.ferrucci
  • 121
  • 1
  • 2
  • 13
  • 1
    Well you can't, `NaN` cannot be represented by integer so you need to accept floats or replace `NaN` with something that can be represented by integers – EdChum Nov 07 '16 at 11:34
  • Ok, no problem if NaN remain floats, but I would like the "string 2digits numbers" to be converted to int, and not to float, it's really impossible?! – mik.ferrucci Nov 07 '16 at 11:37
  • 2
    It is impossible in pandas, you can normally have mixed dtypes but when it comes to pure numeric types, the dtype will need to be homogenous. your original column has strings and `NaN` this is allowed, if you had ints, strings and floats this is allowed but a pure numeric column has to be all ints/floats – EdChum Nov 07 '16 at 11:40
  • Ok, I didn't know, Thank you EdChum for this explaination! – mik.ferrucci Nov 07 '16 at 11:44

1 Answers1

7

You can use to_numeric, but is impossible get int with NaN values - they are always converted to float: see na type promotions.

df['YearBirth2'] = pd.to_numeric(df.YearBirth1, errors='coerce')
print (df)
  YearBirth1  YearBirth2
0         34        34.0
1         76        76.0
2        Nan         NaN
3         09         9.0
4        Nan         NaN
5         91        91.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252