Change column type to int64 pandas geopandas best practice

Question

I am trying to convert a column containing floats with only .0 as decimals to an integer64. I found some older answers on this forum but they didn't seem to work anymore. Eventually I used:

df_test["column_name"] = df_test['column_name'].apply(lambda x: np.int64(x))

I was wondering if this is best practice in pandas and how this compares to to_numeric()

sacuL · Accepted Answer · 2018-02-15T17:43:17.613

7

In pandas, this would work:

df_test['column_name'] = df_test['column_name'].astype('int64')

as geopandas is built on top of pandas, this should work as well. As for how it compares to to_numeric, they are both vectorized, and comparable as far as speed goes:

Testing the speed of astype method vs the to_numeric method for a modest sized Series, I got an average of 0.00007522797584533691 seconds for astype and 0.0003248021602630615 seconds for to_numeric.

edited Feb 15 '18 at 17:43

answered Feb 15 '18 at 17:20

sacuL

49,704
8
81
106

Great! I parsed np.int64 instead of the string "int64" – Rutger Hofste Feb 15 '18 at 17:43
1

Wish i could accept both answers but the speed comparison is helpful – Rutger Hofste Feb 16 '18 at 08:59

score 2 · Answer 2 · answered Feb 15 '18 at 17:21

Your best option, if the most optimal integer format is required:

df_test["column_name"] = pd.to_numeric(df_test['column_name'], downcast='integer')

This is vectorised, df.series.apply is a loop and is slow.

If you really need np.int64, see @sacul's solution.

Change column type to int64 pandas geopandas best practice

2 Answers2