41

What is the most efficient way to convert a geopandas geodataframe into a pandas dataframe? Below is the method I use, is there another method which is more efficient or better in general at not generating errors?

import geopandas as gpd
import pandas as pd

# assuming I have a shapefile named shp1.shp
gdf1 = gpd.read_file('shp1.shp')

# then for the conversion, I drop the last column (geometry) and specify the column names for the new df
df1 = pd.DataFrame(gdf1.iloc[:,:-1].values, columns = list(gdf1.columns.values)[:-1] )
jberrio
  • 972
  • 2
  • 9
  • 20

1 Answers1

66

You don't need to convert the GeoDataFrame to an array of values, you can pass it directly to the DataFrame constructor:

df1 = pd.DataFrame(gdf)

The above will keep the 'geometry' column, which is no problem for having it as a normal DataFrame. But if you actually want to drop that column, you can do (assuming the column is called 'geometry'):

df1 = pd.DataFrame(gdf.drop(columns='geometry'))
# for older versions of pandas (< 0.21), the drop part: gdf.drop('geometry', axis=1)

Two notes:

  • It is often not needed to convert a GeoDataFrame to a normal DataFrame, because most methods that you know from a DataFrame will just work as well. Of course, there are a few cases where it is indeed needed (e.g. to plot the data without the geometries), and then the above method is the best way.
  • The first way (df1 = pd.DataFrame(gdf)) will not take a copy of the data in the GeoDataFrame. This will often be good from an efficiency point of view, but depending on what you want to do with the DataFrame, you might want an actual copy: df1 = pd.DataFrame(gdf, copy=True)
joris
  • 133,120
  • 36
  • 247
  • 202
  • 2
    Thanks, it's very helpful. A note - `gdf.drop(columns='geometry')` with the `columns` keyword only works since pandas version 0.21 which is relatively recent. It doesn't work for me and it may not work for others. – jberrio Mar 27 '18 at 07:23
  • 3
    Yes, that's true. The alternative is `gdf.drop('geometry', axis=1)`, will add that. – joris Mar 27 '18 at 07:43
  • 3
    One important note (applicable at least for pandas 1.0.5 ): if you only construct new dataframe with pd.DataFrame(geopandas_df) it is not guaranteed that series within new pandas df wouldn't be geopandas.array. This can cause several method not implemented errors when invoking pandas methods. – Ivan Sudos Nov 28 '20 at 22:43
  • @ИванСудос Does that mean that converting the geodataframe to a numpy array is the safest way to make the conversion (e.g. using the code in the original question)? Or is there a better alternative you can suggest? – jberrio May 19 '21 at 00:18
  • 1
    @jberrio well, I mostly resolve this with structuring code so that I avoid non-trivial pandas operation on geopandas and find it to be the best way. But in case where It is really needed I'm agree with you and suggest .to_numpy() method since it doesn't copy anything unless parameter copy is specified. – Ivan Sudos May 19 '21 at 18:00