2

I am trying to convert a column containing floats with only .0 as decimals to an integer64. I found some older answers on this forum but they didn't seem to work anymore. Eventually I used:

df_test["column_name"] = df_test['column_name'].apply(lambda x: np.int64(x))

I was wondering if this is best practice in pandas and how this compares to to_numeric()

jpp
  • 159,742
  • 34
  • 281
  • 339
Rutger Hofste
  • 4,073
  • 3
  • 33
  • 44

2 Answers2

7

In pandas, this would work:

df_test['column_name'] = df_test['column_name'].astype('int64')

as geopandas is built on top of pandas, this should work as well. As for how it compares to to_numeric, they are both vectorized, and comparable as far as speed goes:

Testing the speed of astype method vs the to_numeric method for a modest sized Series, I got an average of 0.00007522797584533691 seconds for astype and 0.0003248021602630615 seconds for to_numeric.

sacuL
  • 49,704
  • 8
  • 81
  • 106
2

Your best option, if the most optimal integer format is required:

df_test["column_name"] = pd.to_numeric(df_test['column_name'], downcast='integer')

This is vectorised, df.series.apply is a loop and is slow.

If you really need np.int64, see @sacul's solution.

jpp
  • 159,742
  • 34
  • 281
  • 339