Consider a dataframe that looks like this:
>>> df
A B C
0 1 4 10
1 2 5 11
2 3 6 12
3 1 7 13
4 2 8 14
5 3 9 15
6 1 4 16
7 2 5 17
8 3 6 18
Next, consider creating a DataFrameGroupBy
object grouping the dataframe by the 'A' column using the pandas DataFrame.groupby
function. Finally, we will apply the following user defined function to the DataFrameGroupBy object using the DataFrameGroupBy.apply method:
>>> def do_group_stuff(grp,grpname):
... print(f"grpname: {grpname}")
... grp.apply(lambda row: print(row),axis=1)
>>> df.groupby(['A']).apply(lambda grp: do_group_stuff(grp,grp.name))
I expect there to be three groups in the DataFrameGroupBy object corresponding to the three values seen in the 'A' column of df and the output to look something like this:
grpname: 1
A 1
B 4
C 10
Name: 0, dtype: int64
A 1
B 7
C 13
Name: 3, dtype: int64
A 1
B 4
C 16
Name: 6, dtype: int64
grpname: 2
A 2
B 5
C 11
Name: 1, dtype: int64
A 2
B 8
C 14
Name: 4, dtype: int64
A 2
B 5
C 17
Name: 7, dtype: int64
grpname: 3
A 3
B 6
C 12
Name: 2, dtype: int64
A 3
B 9
C 15
Name: 5, dtype: int64
A 3
B 6
C 18
But in reality the output looks like this:
grpname: 1
A 1
B 4
C 10
Name: 0, dtype: int64
A 1
B 7
C 13
Name: 3, dtype: int64
A 1
B 4
C 16
Name: 6, dtype: int64
grpname: 1
A 1
B 4
C 10
Name: 0, dtype: int64
A 1
B 7
C 13
Name: 3, dtype: int64
A 1
B 4
C 16
Name: 6, dtype: int64
grpname: 2
A 2
B 5
C 11
Name: 1, dtype: int64
A 2
B 8
C 14
Name: 4, dtype: int64
A 2
B 5
C 17
Name: 7, dtype: int64
grpname: 3
A 3
B 6
C 12
Name: 2, dtype: int64
A 3
B 9
C 15
Name: 5, dtype: int64
A 3
B 6
C 18
where the "1" group is repeate twice for some reason. Any ideas why this is the case?