I am unsure on how to phrase my question, but if you did it in SQL it would be something like this:
SELECT MOVIES.TITLE FROM MOVIES WHERE MOVIES.MOVIE_ID = RATINGS.MOVIE_ID
I have the two dataframes, movies
and ratings
. I would like to do exactly as stated in the SQL statement.
I have this so far:
grouped_by_movies = ratings.groupby("movie_id")
## average and combine
# your code here
average_ratings = grouped_by_movies.rating.mean()
average_ratings.columns = ['movie_id', 'average_rating']
print average_ratings.head()
print "----------------------------------"
## IN ONE LINE OF CODE FOR AVERAGE:
print ratings.groupby("movie_id").rating.mean().head()
print "----------------------------------"
# get the maximum rating
# your code here
max_rating = ratings.rating.max()
print max_rating
print "----------------------------------"
##print ratings.groupby("movie_id").rating.max().head()
# get movie ids with that rating
# your code here
print ratings.loc[ratings["rating"] == max_rating, "movie_id"].head()
print "----------------------------------"
print "Good movie ids:"
print #your code here
sorted_movie_ids = average_ratings.sort_values(ascending = False)
sorted_movie_ids = sorted_movie_ids.to_frame().reset_index()
sorted_movie_ids.columns = ['movie_id', 'average_rating']
print sorted_movie_ids.head()
print
print "----------------------------------"
print
print "Best movie titles"
print # your code here
ids = sorted_movie_ids["movie_id"].unique()
print ids
movie_titles = movies.loc[movies['movie_id'].isin(ids), ['title']]
print movie_titles.head()