0

I have the following dataframe in Pandas:

data = {'Book':['Author1book1', 'Author1book2', 'Author2book1', 'Author2book2'],'Author':['Author1', 'Author2', 'Author3', 'Author4'],'Votes':[34,4363,3234,234]}
df = pd.DataFrame(data)
df

Which gives:

    Book            Author  Votes
0   Author1book1    Author1 34
1   Author1book2    Author2 4363
2   Author2book1    Author3 3234
3   Author2book2    Author4 234
etc etc

So essentially I'd first like to find say the top 10 most voted for Authors. I'm doing that at the moment with:

df_agg = df.groupby(['Author']).agg({'Votes':sum})

then

sort_df = df_agg.sort_values(["Votes"], ascending=False).head(10)

That shows me the top 10 Authors by votes.

What I'm ultimately trying to end up with is the following:

Author   Book          Votes
Author1  Author1Book2  4363
         Author1Book1  34
Author2  Author2Book1  3234
         Author2Book2  234

So basically I want to show the books, sorted by most votes, for each of the top 10 most voted for authors.

I'm sure there's a much simpler way to do this but I'm just learning pandas so I'm banging my head against it.

Doing something like this:

df_agg = df.groupby(['Author', 'Book']).agg({'Votes':sum})

sort_df = df_agg.sort_values(['Author', 'Votes'], ascending=[True, False]).head(10)

Almost gets me there, it just doesn't group by the top 10 most voted for authors...

simonk83
  • 79
  • 9
  • `df_agg = df.groupby(['Author', 'Book'])['Votes'].sum(); df_agg.groupby('Author').nlargest(10)`? – Quang Hoang Mar 31 '20 at 11:00
  • Thanks , @QuangHoang unfortunately that pretty much gives me the same result as my last two commands in the question. It does sort by the number of votes, but seems to just be for random Authors (or maybe just whatever order the data is in), so it's not finding the largest number of votes overall and ordering by that. – simonk83 Mar 31 '20 at 11:17
  • It does give you the 10 most voted books for each author. Maybe you just need to sort after all that. – Quang Hoang Mar 31 '20 at 11:20
  • Yep maybe, I can't work it out though :D – simonk83 Mar 31 '20 at 11:26

0 Answers0