2

I have three columns like shown below, and trying to return top1 and top2 highest count of the third column. I want this output to be generated as shown in the expected output . DATA :

print (df)

   AGE GENDER rating
0   10      M     PG
1   10      M      R
2   10      M      R
3    4      F   PG13
4    4      F   PG13

CODE :

 s = (df.groupby(['AGE', 'GENDER'])['rating']
       .apply(lambda x: x.value_counts().head(2))
       .rename_axis(('a','b', 'c'))
       .reset_index(level=2)['c'])

output :

print (s)

a   b
4   F    PG13
10  M       R
    M      PG
Name: c, dtype: object

EXPECTED OUTPUT :

print (s[F])
('PG13')

print(s[M])

('PG13', 'R')
pylearner
  • 1,358
  • 2
  • 10
  • 26

1 Answers1

1

I think you need:

s = (df.groupby(['AGE', 'GENDER'])['rating']
       .apply(lambda x: x.value_counts().head(2))
       .rename_axis(('a','b', 'c'))
       .reset_index()
       .groupby('b')['c']
       .apply(list)
       .to_dict()
       )
print (s)
{'M': ['R', 'PG'], 'F': ['PG13']}
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Awesome it worked ...You saved me again . Is there any other way that I can greet you ?? I would do it :) Thanks alot. – pylearner Feb 13 '18 at 14:31
  • You are welcome! Btw, this [solution](https://stackoverflow.com/a/48724663/2901002) does not work? – jezrael Feb 13 '18 at 14:33
  • 1
    No jezz, I have 15 columns which I should do a group by and my sequence combination changes, also, there are null values which it should handle .. i had to ignore them and chnage my sequence again. so the above solution worked . I have created conditions and inserted your code there. – pylearner Feb 13 '18 at 14:38
  • how can I connect you through linked in or any social networking ? – pylearner Feb 14 '18 at 14:50
  • I dont use fb, nor something similar. But you can send me email, but not very often check it. ;) Email is in my profile. – jezrael Feb 14 '18 at 14:51