I think you need for counts with categories and ratings use groupby
+ value_counts
+ head
:
df1 = (df.groupby('gender')['rating']
.apply(lambda x: x.value_counts().head(1))
.rename_axis(('gender','rating'))
.reset_index(name='val'))
print (df1)
gender rating val
0 F PG13 2
1 M R 2
If want only top ratings seelct first value of index per group:
s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().index[0])
print (s)
gender
F PG13
M R
Name: rating, dtype: object
print (s['M'])
R
print (s['F'])
PG13
Or only top counts select first value of Series
per group:
s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().iat[0])
print (s)
gender
F 2
M 2
Name: rating, dtype: int64
print (s['M'])
2
print (s['F'])
2
EDIT:
s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().index[0])
def gen_mpaa(gender):
return s[gender]
print (gen_mpaa('M'))
print (gen_mpaa('F'))
EDIT:
Solution if genre id
values are strings:
print (type(df.loc[0, 'genre id']))
<class 'str'>
df = df.set_index('gender')['genre id'].str.split(',', expand=True).stack()
print (df)
gender
M 0 11
1 22
2 33
0 22
1 44
2 55
0 33
1 44
2 55
F 0 11
1 22
0 22
1 55
0 55
1 44
dtype: object
d = df.groupby(level=0).apply(lambda x: x.value_counts().index[0]).to_dict()
print (d)
{'M': '55', 'F': '55'}
EDIT1:
print (df)
AGE GENDER rating
0 10 M PG
1 10 M R
2 10 M R
3 4 F PG13
4 4 F PG13
s = (df.groupby(['AGE', 'GENDER'])['rating']
.apply(lambda x: x.value_counts().head(2))
.rename_axis(('a','b', 'c'))
.reset_index(level=2)['c'])
print (s)
a b
4 F PG13
10 M R
M PG
Name: c, dtype: object