All,
I have dataframe with four columns ('key1', 'key2', 'data1', 'data2'). I inserted some nan into data1. Now I want to fill the nan with values that is the most occuring value within each group after I do groupby(['key1', 'key2'])
.
dt = pd.DataFrame ({'key1': np.random.choice(['a', 'b'], size=100),
'key2': np.random.choice(['c', 'd'], size=100),
'data1': np.random.randint(5, size=100),
'data2': np.random.randn(100)},
columns = ['key1', 'key2','data1', 'data2'])
#insert nan
dt['data1'].ix[[2,6,10]]= None
# group by key1 and key2
group =dt.groupby(['key1', 'key2'])['data1']
group.value_counts(dropna=False)
key1 key2 data1
a c 1.0 8
4.0 6
0.0 4
2.0 2
3.0 1
d 0.0 7
1.0 6
4.0 6
2.0 5
NaN 3
3.0 1
b c 0.0 7
2.0 7
1.0 3
3.0 2
4.0 2
d 2.0 11
1.0 10
0.0 3
3.0 3
4.0 3
What I wan to do is, for this example, fill the nan in the data1 column with 0.0 (most frequent value within group (key1=a, key2=d).
thank you very much for help!