I am trying to understand how to use the groupby().apply() function in Pandas, so I made a simple dummy program that would print the grouped dataframe for each group:
import pandas as pd
def dummy(df):
print(df)
return df
df_original = pd.DataFrame({'A': ['a,a,a,a','b,b,b','c','d,d,d', 'e'], 'B': [0, 0, 1, 1, 2]})
print(df_original)
df2 = df_original.groupby('B').apply(dummy)
The output I get however, shows that the first group is printed twice, as if the apply function iterated twice over it:
# original dataframe
A B
0 a,a,a,a 0
1 b,b,b 0
2 c 1
3 d,d,d 1
4 e 2
# output of dummy()
A B
0 a,a,a,a 0
1 b,b,b 0
A B
0 a,a,a,a 0
1 b,b,b 0
A B
2 c 1
3 d,d,d 1
A B
4 e 2
I cannot understand where something so simple can go wrong