I have a dataset of actors and directors and the popularity of the movie that they have worked together in.
print (actors_director_df.head(3))
actor director popularity counter
0 Chris Pratt Colin Trevorrow 32.985763 0
1 Bryce Dallas Howard Colin Trevorrow 32.985763 0
2 Irrfan Khan Colin Trevorrow 32.985763 0
I want to group by using actor and director because a pair can work in more than one film. which I successfully did use below query.
actor_director_grouped = actors_director_df.groupby(['actor','director']) \
.size() \
.reset_index(name='count') \
.sort_values(['count'], ascending=False) \
.head(10)
print (actor_director_grouped)
actor director count
3619 Clint Eastwood Clint Eastwood 14
19272 Woody Allen Woody Allen 12
9606 Johnny Depp Tim Burton 8
But the popularity column goes missing in this DF.
What I want to do is to do a mean of popularity column after groupby and show the mean in front of actor and director along with the count of the number of movies they did together.
i.e. my ideal output would be something like this.
actor director popularity count
3619 Clint Eastwood Clint Eastwood 32.985763 14
19272 Woody Allen Woody Allen 5.1231231 12
9606 Johnny Depp Tim Burton 3.1231231 8