0

This is the my dataset and I am trying to fill the date columns that have NaN with min value.

I am trying to fill the NaN in the date columns with the min from that date column for the ag_id (grouped-by element). When I do the below I get unexpected output.

df_test_revenue_1["1st_rev_month"] = df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))

Unexpected output from executing the above :

enter image description here

I was expecting all the values of the column 1st_rev_month to be 2017-10-01. Instead it looks like it it picking the values for 1st_rev_month from column revenue_month

My end goal is to get this result (apple the same logic to rest of the date columns (except revenue_month column enter image description here

ALollz
  • 57,915
  • 7
  • 66
  • 89
zoomi
  • 45
  • 5
  • post text, not screen shots: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Paul H Jul 13 '20 at 15:29

1 Answers1

0

Your code:

df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))

never refer to column "1st_rev_month". your refer to it on the left hand side, means "put the result in the 1st_rev_month" column. But what is result? it's minimum over all dataframe, not just over "1st_rev_month".

You need to take the minimum only over "1st_rev_month" column.

The fix:

df_test_revenue_1["1st_rev_month"]=df_test_revenue_1.groupby("ag_id")['1st_rev_month'].transform(lambda x: x.fillna(x.min()))

adding ['1st_rev_month'] after groupby to take only relevant column

Roim
  • 2,986
  • 2
  • 10
  • 25