Converting a series of floats to int - some NaNs in list are causing an error 'cannot convert float NaN to integer'. How to skip NaNs?

Question

I have a very large column of phone numbers in a pandas dataframe, and they're in float format: 3.52831E+11. There are also NaNs present.

I am trying to convert the numbers to int and it's throwing an error that NaNs can't be converted to int. Fair enough. But I can't seem to get around this.

Here's a sample:

df = pd.DataFrame({'number':['3.578724e+11','3.568376e+11','3.538884e+11',np.NaN]})


    number
0   3.578724e+11
1   3.568376e+11
2   3.538884e+11
3   NaN


# My first attempt: here's where I try to convert them to int() however I get 'cannot convert float NaN to integer'. 

df['number'] = [int(x) for x in df['number'] if isinstance(x, float)]


# I have also tried the below, but I get SyntaxError: invalid syntax.

df['number'] = [int(x) for x in df['number'] if x not None]


# and then this one, but the error is: TypeError: must be real number, not str

df['number'] = [int(x) for x in df['number'] if not math.isnan(x) and isinstance(x, float)]

I'd appreciate some pointers on this. I thought at least one of these would work.

Thanks folks

`pd.to_numeric(df.number,errors='coerce').dropna().astype(int)` ?? — anky, Jun 14 '19 at 17:14
OK, I'm going to reopen this because the solution is a little more nuanced than presented in the dupe. — cs95, Jun 14 '19 at 17:27
@anky_91 this is the result. Still a float: `0 -2.147484e+09 1 -2.147484e+09 2 -2.147484e+09 3 -2.147484e+09 4 -2.147484e+09` — SCool, Jun 14 '19 at 17:28
@BlueRineS `invalid literal for int() with base 10: '3.578724e+11'` — SCool, Jun 14 '19 at 17:32

score 1 · Accepted Answer · answered Jun 14 '19 at 17:28

From pandas 0.24+, we have the Nullable Integer Type. The first step is to convert your strings (objects) to float, then to nullable int:

df.astype('float').astype(pd.Int64Dtype())                                                                                          

         number
0  357872400000
1  356837600000
2  353888400000
3           NaN

As a shorthand, you may also do,

df.astype('float').astype('Int64')                                                                                                 

         number
0  357872400000
1  356837600000
2  353888400000
3           NaN

On older versions, your only option will be to drop NaNs and convert:

df.dropna(subset=['number']).astype({'number':float}).astype({'number':int})                                                        

         number
0  357872400000
1  356837600000
2  353888400000

Converting a series of floats to int - some NaNs in list are causing an error 'cannot convert float NaN to integer'. How to skip NaNs?

1 Answers1