I (think I) need to write a custom aggregation function for the geopandas.GeoDataFrame.dissolve() operation. When merging multiple polygons, I want to keep the information of the polygon with the largest area, that also fulfils other criteria. The operation works fine, but afterwards all attributes of my GeoDataFrame are of dtype object
.
The same issue happens with regular pandas groupy()
, so I have simplified the example below. Can someone tell me if I should write my custom_sort()
differently, to keep the dtypes intact?
import pandas as pd
df = pd.DataFrame({
'group': ['A', 'A', 'B', 'B'],
'ints': [1, 2, 3, 4],
'floats': [1.0, 2.0, 2.2, 3.2],
'strings': ['foo', 'bar', 'baz', 'qux'],
'bools': [True, True, True, False],
'test': ['drop this', 'keep this', 'keep this', 'drop this'],
})
def custom_sort(df):
"""Define custom aggregation function with special sorting."""
df = df.sort_values(by=['bools', 'floats'], ascending=False)
return df.iloc[0]
print(df)
print(df.dtypes)
print()
grouped = df.groupby(by='group').agg(custom_sort)
print(grouped)
print(grouped.dtypes) # Issue: All dtypes are object
print()
print(grouped.convert_dtypes().dtypes) # Possible solution, but not for me
# Please note that I cannot use convert_dtypes(). I actually need this for
# geopandas.GeoDataFrame.dissolve() and I think convert_dtypes() messes up
# the geometry information
Output:
group ints floats strings bools test
0 A 1 1.0 foo True drop this
1 A 2 2.0 bar True keep this
2 B 3 2.2 baz True keep this
3 B 4 3.2 qux False drop this
group object
ints int64
floats float64
strings object
bools bool
test object
dtype: object
ints floats strings bools test
group
A 2 2.0 bar True keep this
B 3 2.2 baz True keep this
ints object
floats object
strings object
bools object
test object
dtype: object
ints Int64
floats Float64
strings string
bools boolean
test string
dtype: object