Issue when I groupby & fill na with the min date value by the grouped element

Question

This is the my dataset and I am trying to fill the date columns that have NaN with min value.

I am trying to fill the NaN in the date columns with the min from that date column for the ag_id (grouped-by element). When I do the below I get unexpected output.

df_test_revenue_1["1st_rev_month"] = df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))

Unexpected output from executing the above :

I was expecting all the values of the column 1st_rev_month to be 2017-10-01. Instead it looks like it it picking the values for 1st_rev_month from column revenue_month

My end goal is to get this result (apple the same logic to rest of the date columns (except revenue_month column

post text, not screen shots: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Paul H, Jul 13 '20 at 15:29

score 0 · Accepted Answer · answered Jul 13 '20 at 15:45

Your code:

df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))

never refer to column "1st_rev_month". your refer to it on the left hand side, means "put the result in the 1st_rev_month" column. But what is result? it's minimum over all dataframe, not just over "1st_rev_month".

You need to take the minimum only over "1st_rev_month" column.

The fix:

df_test_revenue_1["1st_rev_month"]=df_test_revenue_1.groupby("ag_id")['1st_rev_month'].transform(lambda x: x.fillna(x.min()))

adding ['1st_rev_month'] after groupby to take only relevant column

Issue when I groupby & fill na with the min date value by the grouped element

1 Answers1