Have a dataframe with several columns from which I want to extract one row for each "family" of individuals that has the most frequent number ("No"). I have tested this with a for -loop that seems to work, but being a newbe I wanted to know if there is a shorter/smarter way of doing it.
Here is a short example code:
import pandas as pd
ind = [ ('A', 'a', 0.1 , 9) ,
('B', 'b', 0.6 , 10) ,
('C', 'b', 0.4 , 10) ,
('D', 'b', 0.2, 7) ,
('E', 'a', 0.9 , 6) ,
('F', 'b', 0.7 , 11)
]
df = pd.DataFrame(ind, columns = ['Name' , 'Family', 'Prob', 'No'])
res = pd.DataFrame(columns = df.columns)
for name,g in df.groupby('Family'):
v = g['No'].value_counts().idxmax()
idx = g['No'] == v
si = g[idx].iloc[0]
res = res.append(si)
print(res)
I have looked at several exampels that do some of it like this but with that I can only get the "Family" and "No" and not the whole row...