3

Here is a simple DataFrame:

import numpy as np
dd=pd.DataFrame(np.arange(35).reshape(7,5), columns=list('xyzwv'))
dd['w']=list('AABBBCC')

,which is:

enter image description here

Now I try the following code

def func(x):
    print(x)
    return x

dd.groupby('w').apply(func)

then it prints out:

enter image description here

I think something goes wrong because enter image description here is being printed twice.

It looks as if func() is being called twice for the same group. What mistake did I do?

Royalblue
  • 639
  • 10
  • 22
  • Does this answer your question? [Pandas GroupBy.apply method duplicates first group](https://stackoverflow.com/questions/21390035/pandas-groupby-apply-method-duplicates-first-group) – Trenton McKinney Aug 30 '20 at 06:21

1 Answers1

2

apply calls the first group twice to work out whether it could do some optimizations, see http://pandas.pydata.org/pandas-docs/stable/groupby.html#flexible-apply

ePak
  • 484
  • 5
  • 12
  • 1
    I don't see any optimization related notes from the link you give. But I think calling twice for that reason makes sense. Thanks for the explanation! – Royalblue Nov 20 '17 at 06:27
  • 1
    @Royalblue This answer is no longer correct for versions of pandas from 0.25 [Groupby.apply on DataFrame evaluates first group only once](https://pandas.pydata.org/pandas-docs/version/0.25.3/whatsnew/v0.25.0.html#groupby-apply-on-dataframe-evaluates-first-group-only-once) – Trenton McKinney Aug 30 '20 at 06:39