Use SeriesGroupBy.value_counts
and select first value of index:
df = df.groupby('col1')['col2'].apply(lambda x: x.value_counts().index[0]).reset_index()
print (df)
col1 col2
0 blue nb
1 green gx
Or add DataFrame.drop_duplicates
:
df = df.groupby('col1')['col2'].value_counts().reset_index(name='v')
df = df.drop_duplicates('col1')[['col1','col2']]
print (df)
col1 col2
0 blue nb
2 green gx
Or use Series.mode
and select first value by positions by Series.iat
:
df = df.groupby('col1')['col2'].apply(lambda x: x.mode().iat[0]).reset_index()
print (df)
col1 col2
0 blue nb
1 green gx
EDIT:
Problem is with only NaN
s groups:
d = {'col1': ['green','green','green','blue','blue','blue'],
'col2': [np.nan,np.nan,np.nan,'nb','nb','mj']}
df = pd.DataFrame(data=d)
f = lambda x: np.nan if x.isnull().all() else x.value_counts().index[0]
#or
#f = lambda x: next(iter(x.value_counts().index), np.nan)
#another solution
#f = lambda x: next(iter(x.mode()), np.nan)
df = df.groupby('col1')['col2'].apply(f).reset_index()
print (df)
col1 col2
0 blue nb
1 green NaN