I had to clarify my question because some think that is unclar question. The question is, i have a list of four different types of data. Here is a short part of it to clearify my idea.
UserID movie_id rating unix_timestamp
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817
7 451 5 891353892
Some monvies, the sum of ratings that they got from the users were 50 , 44 or 88 etc.
For instanace, movie_id (451
) got 3 and 5 ratings (so it got 8 ratings together ). I wanted to exclude those movies which got less than 50 ratings. and get the average of the other movies with more than 50 ratings(the sum of ratings that they got from the users) and show only the top 5 or 10 values.
Here is part pf the code
grouped_data = ratings['rating'].groupby(ratings['movie_id'])
## average and combine
average_ratings = grouped_data.mean()
print ("Average ratings:")
print (average_ratings.head())