I'm pretty new to machine learning. I have two dataframes that have movie ratings in them. Some of the movie ratings have the same movie title, but different number ratings while other rows have movie titles that the other data frame doesn't have. I was wondering how I would be able to combine the two dataframes and average any ratings that have the same movie name. Thanks for the help!
Asked
Active
Viewed 706 times
0
-
Removed `machine-learning` and `numpy` tag it has nothing to do with the question. And please don't post images of data frame, transcribing images is tedious, instead post `df.to_dict()` to the question. It makes reproducing your data locally very easy. Please go through [How to make good pandas reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Ch3steR Jul 22 '20 at 15:19
1 Answers
0
You can use pd.concat
with GroupBy.agg
# df = pd.DataFrame({'Movie':['IR', 'R'], 'rating':[95, 90], 'director':['SB', 'RC']})
# df1 = pd.DataFrame({'Movie':['IR', 'BH'], 'rating':[93, 88], 'direction':['SB', 'RC']})
(pd.concat([df, df1]).groupby('Movie', as_index=False).
agg({'rating':'mean', 'director':'first'}))
Movie rating director
0 BH 88 RC
1 IR 94 SB
2 R 90 RC
Or df.append
df.append(df1).groupby('Movie',as_index=False).agg({'rating':'mean', 'director':'first'})
Movie rating director
0 BH 88 RC
1 IR 94 SB
2 R 90 RC
- If you want
Movie
column as index,as_index
parameter ofdf.groupby
defaults toTrue
,Movie
column would be index, removeas_index=False
fromgroupby
- If you want to maintain the order then set
sort
parameter toTrue
ingroupby
.(df.append(df1).groupby('Movie',as_index=False, sort=False). agg({'rating':'mean', 'director':'first'})) Movie rating director 0 IR 94 SB 1 R 90 RC 2 BH 88 RC

Ch3steR
- 20,090
- 4
- 28
- 58