1

Let say I have a CSV of name, gender, and its count.

I am looking for majority name by using groupby() and max(). But I found something strange from the result:

CSV:

Name     Gender  Count
Connie   F       90
Connie   F       78
Peter    M       200
Connie   M       5
Connie   F       94
Connie   F       67
John     M       100
Connie   F       73
Connie   F       82
Connie   F       73
May      F       65

First part of the code is looking fine:

>>>data = pd.read_csv('names.txt',names=['Name','Gender','Count'])
>>>data = data.groupby(['Name','Gender']).sum().reset_index()
>>>print (data)
     Name Gender  Count
0  Connie      F    557
1  Connie      M      5
2    John      M    100
3     May      F     65
4   Peter      M    200

There are two records with 'Connie' and I need to select the majority one.

>>>data= data.groupby(['Name']).max().reset_index()
>>>print(data)
     Name Gender  Count
0  Connie      M    557
1    John      M    100
2     May      F     65
3   Peter      M    200

Did I do something wrong so that the gender of 'Connie' is M instead of F? while the max count is correct

NeoNosliw
  • 111
  • 1
  • 1
  • 7

1 Answers1

1

It is correct, because M > F, better explanation is here.

Also I find this, so string columns are not omitted by automatic exclusion of nuisance columns:

strings have lt() defined so the built in min() and max() work on them. If the non-numeric object supports the proper comparison methods, min() and max() aggregate functions should be non-ambiguous.

print (data.groupby(['Name'])['Gender'].max())
Name
Connie    M
John      M
May       F
Peter     M
Name: Gender, dtype: object

For correct output need idxmax for indices of max values per group in Count column and then select by loc:

print (data.groupby(['Name'])['Count'].idxmax())
Name
Connie    0
John      2
May       3
Peter     4
Name: Count, dtype: int64

data = data.loc[data.groupby(['Name'])['Count'].idxmax()]
print (data)
     Name Gender  Count
0  Connie      F    557
2    John      M    100
3     May      F     65
4   Peter      M    200
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252