I am working on an assignment for my Data Science class. I just need help getting started, as I'm having trouble understanding how to use pandas to group and selecting DISTINCT
values.
I need to find the movies with the HIGHEST RATINGS by FEMALES, my code returns me movies with ratings = 5
, and gender = 'F'
, but it also repeats the same movie over and over again, since there are more than 1 users. I'm not sure how to just show movie, count of 5-star ratings, and gender = F. below is my code:
import pandas as pd
import os
m = pd.read_csv('movies.csv')
u = pd.read_csv('users.csv')
r = pd.read_csv('ratings.csv')
ur = pd.merge(u,r)
data = pd.merge(m,ur)
df = pd.DataFrame(data)
top10 = df.loc[(df.gender == 'F')&(df.rating == 5)]
print(top10)
the data files can be downloaded here
I just need some help getting started, theres alot more to the homework, but once I figure this out I can do the rest. Just need a jump-start. thank you very much
mv_id title genres rating user_id gender
1 Toy Story (1995) Animation|Children's|Comedy 5 1 F
2 Jumanji (1995) Adventure|Children's|Fantasy 5 2 F
3 Grumpier Old Men (1995) Comedy|Romance 5 3 F
4 Waiting to Exhale (1995) Comedy|Drama 5 4 F
5 Father of the Bride Part II (1995) Comedy 5 5 F