I have three dataframes I would like where I want to merge or join them based on the month column/field, then group by title.
df1:
Month Year TotalNumberofStreams TitleSortName
9 2018 1529 Movie A
9 2018 368 Movie B
1 2018 703 Movie C
1 2018 2278 Movie D
1 2018 382 Movie E
df2:
Month Year video_view TitleSortName
9 2018 3 Movie A
9 2018 6 Movie B
3 2017 9 Movie C
3 2017 4 Movie D
3 2017 3 Movie E
df3:
Month Year Views TitleSortName
9 2018 243 Movie A
9 2018 156 Movie B
9 2018 133 Movie C
Desired Output:
Month Year Views video_view views TotalNumberofStreams TitleSortName
9 2018 NaN NaN NaN 1529 Movie A
9 2018 NaN 3 NaN NaN Movie A
9 2018 243 NaN NaN NaN Movie A
Attempts:
I tried merging based on TitleSortName, with this code here:
merge=df1.merge(df2, how='outer',left_on='TitleSortName',right_on='TitleSortName')
however this returns duplicates, and a lot of data that makes me do even more cleaning.
I also attempted to join based on month:
join_df = df1.join(df2.set_index('Month'),on='Month')
this returns Value Error: Pandas join issue: columns overlap but no suffix specified
Im looking through different articles online, and I see maybe I can use a for loop to iterate through the month column and save the rows to a list that are alike and return a the rows how I desire, as well as lambda join functions, for example a:
lambda x: "/" .join(x), based on the desired columns
is there an easier way to do this or any way to achieve the result i want at all?