Can pandas groupby use groupby.apply(func)
and inside the func
use another instance of .apply()
without duplicating and overwriting data?
In a way, the use of .apply()
is nested.
Python 3.7.3
pandas==0.25.1
import pandas as pd
def dummy_func_nested(row):
row['new_col_2'] = row['value'] * -1
return row
def dummy_func(df_group):
df_group['new_col_1'] = None
# apply dummy_func_nested
df_group = df_group.apply(dummy_func_nested, axis=1)
return df_group
def pandas_groupby():
# initialize data
df = pd.DataFrame([
{'country': 'US', 'value': 100.00, 'id': 'a'},
{'country': 'US', 'value': 95.00, 'id': 'b'},
{'country': 'CA', 'value': 56.00, 'id': 'y'},
{'country': 'CA', 'value': 40.00, 'id': 'z'},
])
# group by country and apply first dummy_func
new_df = df.groupby('country').apply(dummy_func)
# new_df and df should have the same list of countries
assert new_df['country'].tolist() == df['country'].tolist()
print(df)
if __name__ == '__main__':
pandas_groupby()
The above code should return
country value id new_col_1 new_col_2
0 US 100.0 a None -100.0
1 US 95.0 b None -95.0
2 CA 56.0 y None -56.0
3 CA 40.0 z None -40.0
However, the code returns
country value id new_col_1 new_col_2
0 US 100.0 a None -100.0
1 US 95.0 a None -95.0
2 US 56.0 a None -56.0
3 US 40.0 a None -40.0
This behavior only appears to happen when both groups have an equal amount of rows. If one group has more rows, then the output is as expected.