Python - Pandas Groupby - When to Use apply() / agg()

Question

I recently encountered an interesting question regarding the difference between .agg() & .apply() in Pandas Groupby().I read the great post from Pandas difference between apply() and aggregate() functions.

It clarified a lot, but still, I am a bit confused about when to use .agg() and when to use .apply().

Demo dataset:

import pandas as pd
import numpy as np

df_min = pd.DataFrame({"A":[0.0,0.0,np.nan,0.0,0.42832,np.nan,0.62747,0.69856],
                   "B":[0.42832,0.69856,0.75865,0.42832,0.62747,0.27024,0.42832,np.nan], 
                   "C":[0,0,1,0,0,0,0,0]})

Sample Dataset

         A        B  C
0  0.00000  0.42832  0
1  0.00000  0.69856  0
2      NaN  0.75865  1
3  0.00000  0.42832  0
4  0.42832  0.62747  0
5      NaN  0.27024  0
6  0.62747  0.42832  0
7  0.69856      NaN  0

The objective: Fill the np.nan via groupby statement.

My Code is listed below:

fill_na = lambda x: x.fillna(x.mean())
df_min.groupby('transportation_issues').apply(fill_na) 
df_min.groupby('transportation_issues').agg(fill_na)

Now, when I applied .apply(), the code did its job and got the result. But when I use .agg(), the ValueError Occured as such:

ValueError: Shape of passed values is (3, 2), indices imply (2, 2)

So, my questions are:

1: Why .agg() did not work?

2: What should I do to make the user defined function works by applying .agg()?

3: When apply the user defined function on groupby(), when I should use .apply() & .agg(), respectively?

4: In groupby(), is it true that .apply() functions on whole dataset and agg() functions on the columns?

5: Under the hood, how .apply() & .agg() differentiate from each other?

Thank you so much for answering my question, and much appreciate for your help!

Please post a sample of data close to actual used in code. See [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/1422451). — Parfait, Oct 24 '20 at 15:48
Does the `transform()` construct in [this post](https://stackoverflow.com/questions/19966018) help? — Bill Huang, Oct 24 '20 at 18:30
@NYCCoder Hey there, the transportation_issues represent the binary column on the demo dataset. — vae, Oct 25 '20 at 21:44
@Parfait Hey there, thank you for your advice! I will follow the advice on your post in future questions. — vae, Oct 25 '20 at 21:45
@BillHuang Hey there, thank you for your answer. I know how to fill the missing value, but here, I am just confused about the application of .agg() & .apply(). — vae, Oct 25 '20 at 21:46

Python - Pandas Groupby - When to Use apply() / agg()

0 Answers0