I want to compare dataframes based on the 'Horse' column. I want to find rows where the 'Odds' in dataframe 1 are bigger than the 'AvgOdds' in dataframe 2 for a particular horse. For example, this would be rows 0 and 1 in dataframe 1 for 'Indian Sounds'. I want the output to include the 'Race', 'Horse', 'Bookmaker', and 'Difference between Odds and Avg Odds'.
Dataframe 1:
Race Horse Bookmaker Odds
0 Bath R2 Indian Sounds BetEasy 2.65
1 Bath R2 Indian Sounds Neds 2.45
2 Bath R2 Indian Sounds Sportsbet 2.20
3 Bath R2 Hello BetEasy 4.2
4 Bath R2 Hello Neds 4.1
5 Bath R2 Hello Sportsbet 4
Dataframe 2:
Horse AvgOdds
0 Indian Sounds 2.43
1 Hello 4.1
Code to construct dataframes:
cols1 = ['Race', 'Horse', 'Bookmaker', 'Odds']
df1 = pd.DataFrame(data=data1, columns=cols1)
cols2 = ['Race', 'Horse', 'Bookmaker', 'AvgOdds']
df2 = pd.DataFrame(data=data1, columns=cols2)
df3 = df2.groupby(by='Horse', sort=False).mean()
df3 = df3.reset_index()
df4 = round(df3,2)
df1[df1['Odds'] > df4['AvgOdds']])
When I use this code I get an error saying can only compare identically-labeled Series objects. I think this is due to the fact that it is trying to compare row 0 from dataframe 1 with row 0 from dataframe 2 and so on, which does not work as there is more rows in dataframe 1. I need it to refer to row 0-2 in dataframe 1 and row 0 in dataframe 2, then row 3-5 in dataframe 1 and row 1 in dataframe 2.