Let say I have a CSV of name, gender, and its count.
I am looking for majority name by using groupby() and max(). But I found something strange from the result:
CSV:
Name Gender Count
Connie F 90
Connie F 78
Peter M 200
Connie M 5
Connie F 94
Connie F 67
John M 100
Connie F 73
Connie F 82
Connie F 73
May F 65
First part of the code is looking fine:
>>>data = pd.read_csv('names.txt',names=['Name','Gender','Count'])
>>>data = data.groupby(['Name','Gender']).sum().reset_index()
>>>print (data)
Name Gender Count
0 Connie F 557
1 Connie M 5
2 John M 100
3 May F 65
4 Peter M 200
There are two records with 'Connie' and I need to select the majority one.
>>>data= data.groupby(['Name']).max().reset_index()
>>>print(data)
Name Gender Count
0 Connie M 557
1 John M 100
2 May F 65
3 Peter M 200
Did I do something wrong so that the gender of 'Connie' is M instead of F? while the max count is correct