I am trying to groupby multiple columns and fillna multiple columns at the same time. I am attaching a picture of what the data looks like as well as my code that I am having issues with. This is sample data that I have created that reflects that actual data, as it is confidential.
There are 4 columns: name, plant, length and width. There are 3 different types of plant. There is missing data for each of the last 3. My end goal is to create a model to guess which plant types are missing. But to do that, I am first attempting to impute the mean of the length and width for each name/plant combination into the missing values for them.
The below shows an example of calculating the means which is working, where I am failing is inserting them to fill the na values.
lengthmean = df.groupby(['name', 'plant']).length.mean()
print(lengthmean)
I get a results that looks like this
name plant
Brian plant 3 2.500000
plant1 1.850000
plant2 2.450000
Jeff plant 3 4.100000
plant1 2.333333
plant2 2.100000
Justin plant 3 2.900000
plant1 1.900000
plant2 2.850000
Zach plant 3 1.750000
plant1 2.650000
plant2 3.300000
I am also attempting to do multiple columns at once (both length and width in this case, but in my real data it is more than that). Below is the code that is failing for me.
df[['length','width']] = df.groupby(['name', 'plant'])['length','width']\
.transform(lambda x: x.fillna(x.mean()))
I am receiving this error 'ValueError: Length mismatch: Expected axis has 32 elements, new values have 40 elements'
I would appreciate any help, thank you!