0

I am trying to understand how to use the groupby().apply() function in Pandas, so I made a simple dummy program that would print the grouped dataframe for each group:

import pandas as pd

def dummy(df):
  print(df)  
  return df

df_original = pd.DataFrame({'A': ['a,a,a,a','b,b,b','c','d,d,d', 'e'], 'B': [0, 0, 1, 1, 2]})
print(df_original)

df2 = df_original.groupby('B').apply(dummy)

The output I get however, shows that the first group is printed twice, as if the apply function iterated twice over it:

# original dataframe
         A  B
0  a,a,a,a  0
1    b,b,b  0
2        c  1
3    d,d,d  1
4        e  2

# output of dummy()
     A      B
0  a,a,a,a  0
1    b,b,b  0
     A      B
0  a,a,a,a  0
1    b,b,b  0
     A    B
2      c  1
3  d,d,d  1
   A  B
4  e  2

I cannot understand where something so simple can go wrong

maruko
  • 187
  • 1
  • 8

1 Answers1

0

You can read what went wrong there as suggested by @Gwendal

If you want a quick fix, then use this

df_original = pd.DataFrame({'A': ['a,a,a,a','b,b,b','c','d,d,d', 'e'], 'B': [0, 0, 1, 1, 2]})

for _ in df_original['B'].unique():
  print(df_original[df_original['B']==_])

Output

         A  B
0  a,a,a,a  0
1    b,b,b  0
       A  B
2      c  1
3  d,d,d  1
   A  B
4  e  2
QuantStats
  • 1,448
  • 1
  • 6
  • 14