I have the following dataframe in Pandas:
data = {'Book':['Author1book1', 'Author1book2', 'Author2book1', 'Author2book2'],'Author':['Author1', 'Author2', 'Author3', 'Author4'],'Votes':[34,4363,3234,234]}
df = pd.DataFrame(data)
df
Which gives:
Book Author Votes
0 Author1book1 Author1 34
1 Author1book2 Author2 4363
2 Author2book1 Author3 3234
3 Author2book2 Author4 234
etc etc
So essentially I'd first like to find say the top 10 most voted for Authors. I'm doing that at the moment with:
df_agg = df.groupby(['Author']).agg({'Votes':sum})
then
sort_df = df_agg.sort_values(["Votes"], ascending=False).head(10)
That shows me the top 10 Authors by votes.
What I'm ultimately trying to end up with is the following:
Author Book Votes
Author1 Author1Book2 4363
Author1Book1 34
Author2 Author2Book1 3234
Author2Book2 234
So basically I want to show the books, sorted by most votes, for each of the top 10 most voted for authors.
I'm sure there's a much simpler way to do this but I'm just learning pandas so I'm banging my head against it.
Doing something like this:
df_agg = df.groupby(['Author', 'Book']).agg({'Votes':sum})
sort_df = df_agg.sort_values(['Author', 'Votes'], ascending=[True, False]).head(10)
Almost gets me there, it just doesn't group by the top 10 most voted for authors...