0

I am unsure on how to phrase my question, but if you did it in SQL it would be something like this:

SELECT MOVIES.TITLE FROM MOVIES WHERE MOVIES.MOVIE_ID = RATINGS.MOVIE_ID

I have the two dataframes, movies and ratings. I would like to do exactly as stated in the SQL statement.

I have this so far:

grouped_by_movies = ratings.groupby("movie_id")
## average and combine
# your code here
average_ratings = grouped_by_movies.rating.mean()
average_ratings.columns = ['movie_id', 'average_rating']
print average_ratings.head()

print "----------------------------------"

## IN ONE LINE OF CODE FOR AVERAGE:
print ratings.groupby("movie_id").rating.mean().head()

print "----------------------------------"

# get the maximum rating
# your code here
max_rating = ratings.rating.max()
print max_rating

print "----------------------------------"

##print ratings.groupby("movie_id").rating.max().head()

# get movie ids with that rating
# your code here
print ratings.loc[ratings["rating"] == max_rating, "movie_id"].head()

print "----------------------------------"

print "Good movie ids:"
print #your code here
sorted_movie_ids = average_ratings.sort_values(ascending = False)
sorted_movie_ids = sorted_movie_ids.to_frame().reset_index()
sorted_movie_ids.columns = ['movie_id', 'average_rating']
print sorted_movie_ids.head()
print

print "----------------------------------"
print

print "Best movie titles"
print # your code here
ids = sorted_movie_ids["movie_id"].unique()
print ids
movie_titles = movies.loc[movies['movie_id'].isin(ids), ['title']]
print movie_titles.head()
Monil
  • 169
  • 1
  • 1
  • 11

1 Answers1

2

By using idxmax

ratings.groupby("movie_id").rating.mean().idxmax()
BENY
  • 317,841
  • 20
  • 164
  • 234