Convert float64 column to int64 in Pandas

Question

I tried to convert a column from data type float64 to int64 using:

df['column name'].astype(int64)

but got an error:

NameError: name 'int64' is not defined

The column has number of people but was formatted as 7500000.0, any idea how I can simply change this float64 into int64?

jezrael · Accepted Answer · 2019-05-30T09:11:11.303

Solution for pandas 0.24+ for converting numeric with missing values:

df = pd.DataFrame({'column name':[7500000.0,7500000.0, np.nan]})
print (df['column name'])
0    7500000.0
1    7500000.0
2          NaN
Name: column name, dtype: float64

df['column name'] = df['column name'].astype(np.int64)

ValueError: Cannot convert non-finite values (NA or inf) to integer

#http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
df['column name'] = df['column name'].astype('Int64')
print (df['column name'])
0    7500000
1    7500000
2        NaN
Name: column name, dtype: Int64

I think you need cast to numpy.int64:

df['column name'].astype(np.int64)

Sample:

df = pd.DataFrame({'column name':[7500000.0,7500000.0]})
print (df['column name'])
0    7500000.0
1    7500000.0
Name: column name, dtype: float64

df['column name'] = df['column name'].astype(np.int64)
#same as
#df['column name'] = df['column name'].astype(pd.np.int64)
print (df['column name'])
0    7500000
1    7500000
Name: column name, dtype: int64

If some NaNs in columns need replace them to some int (e.g. 0) by fillna, because type of NaN is float:

df = pd.DataFrame({'column name':[7500000.0,np.nan]})

df['column name'] = df['column name'].fillna(0).astype(np.int64)
print (df['column name'])
0    7500000
1          0
Name: column name, dtype: int64

Also check documentation - missing data casting rules

EDIT:

Convert values with NaNs is buggy:

df = pd.DataFrame({'column name':[7500000.0,np.nan]})

df['column name'] = df['column name'].values.astype(np.int64)
print (df['column name'])
0                7500000
1   -9223372036854775808
Name: column name, dtype: int64

Got error: ValueError: Cannot convert NA to integer for the first code .astype(np.int64). — MCG Code, May 13 '17 at 18:39
If need convert `NaN` to `0`, use `fillna(0)` - see my second paragraph code - `df['column name'] = df['column name'].fillna(0).astype(np.int64)`. — jezrael, May 14 '17 at 12:21
does not work. Python 3.5 --> df_test["column"] = gdf_test['column'].apply(lambda x: np.int64(x)) worked — Rutger Hofste, Feb 15 '18 at 16:55
how to check if a particular column having 0 as a decimal point. The problem occurred due to empty value and column got converted to float64 so now I have to convert it int64. And columns are not fixed so have to make generic something in case. What can I do? — Vikas Chauhan, May 19 '20 at 19:30
Once `NaN` is filled with other integer values, `np.int64` is no longer required, `int64` works just fine. But could anyone help to explain the difference between these two types of int64 in short? — Jia Gao, Oct 06 '20 at 01:00

MSeifert · Answer 2 · 2017-05-13T18:19:08.173

10

You can need to pass in the string 'int64':

>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1.0, 2.0]})  # some test dataframe

>>> df['a'].astype('int64')
0    1
1    2
Name: a, dtype: int64

There are some alternative ways to specify 64-bit integers:

>>> df['a'].astype('i8')      # integer with 8 bytes (64 bit)
0    1
1    2
Name: a, dtype: int64

>>> import numpy as np
>>> df['a'].astype(np.int64)  # native numpy 64 bit integer
0    1
1    2
Name: a, dtype: int64

Or use np.int64 directly on your column (but it returns a numpy.array):

>>> np.int64(df['a'])
array([1, 2], dtype=int64)

edited May 13 '17 at 18:19

answered May 13 '17 at 18:09

MSeifert

145,886
38
333
352

ValueError: Cannot convert NA to integer – MCG Code May 13 '17 at 18:46
@MCGCode That's not so good, because `NaN`s can't be converted to integers (at least not with a meaningful value because only floats support NaN and Inf). What value should these have in the result? – MSeifert May 13 '17 at 18:50
I guess 0. So if it is NaN, it's not populated so I keep it at zero. – MCG Code May 14 '17 at 12:15
Then you can use `df['column name'].fillna(0)` instead of `df['column name']` and use any approach I mentioned above. – MSeifert May 14 '17 at 13:02

sparrow · Answer 3 · 2020-04-13T21:06:01.743

This seems to be a little buggy in Pandas 0.23.4?

If there are np.nan values then this will throw an error as expected:

df['col'] = df['col'].astype(np.int64)

But doesn't change any values from float to int as I would expect if "ignore" is used:

df['col'] = df['col'].astype(np.int64,errors='ignore')

It worked if I first converted np.nan:

df['col'] = df['col'].fillna(0).astype(np.int64)
df['col'] = df['col'].astype(np.int64)

Now I can't figure out how to get null values back in place of the zeroes since this will convert everything back to float again:

df['col']  = df['col'].replace(0,np.nan)

score 4 · Answer 4 · answered May 21 '20 at 12:09

4

consider using

df['column name'].astype('Int64')

nan will be changed to NaN

answered May 21 '20 at 12:09

Muhammad Bin Ali

205
2
13

to replace the column `df['column name'] = df['column name'].astype('Int64')` – Nando Jun 08 '23 at 16:27

score 1 · Answer 5 · answered Dec 11 '21 at 12:07

1

if you have to convert float64 to int64 you have to use numpy like the exemple below:

import numpy as np
df['column name'].astype(np.int)

answered Dec 11 '21 at 12:07

Mohamed Bra

11
1

Convert float64 column to int64 in Pandas

5 Answers5

Linked