I have two dataframes, train
and test
. The test
set has missing values on a column.
import numpy as np
import pandas as pd
train = [[0,1],[0,2],[0,3],[0,7],[0,7],[1,3],[1,5],[1,2],[1,2]]
test = [[0,0],[0,np.nan],[1,0],[1,np.nan]]
train = pd.DataFrame(train, columns = ['A','B'])
test = pd.DataFrame(test, columns = ['A','B'])
The test set has two missing values on column B
. If the groupby column is A
- If the imputing strategy is
mode
, then the missing values should be imputed with7
and2
. - If the imputing strategy is
mean
, then the missing values should be(1+2+3+7+7)/5 = 4
and(3+5+2+2)/4 = 3
.
What is a good way to do this?
This question is related, but uses only one dataframe instead of two.