Comparing two dataframes based on values within column

Question

I want to compare dataframes based on the 'Horse' column. I want to find rows where the 'Odds' in dataframe 1 are bigger than the 'AvgOdds' in dataframe 2 for a particular horse. For example, this would be rows 0 and 1 in dataframe 1 for 'Indian Sounds'. I want the output to include the 'Race', 'Horse', 'Bookmaker', and 'Difference between Odds and Avg Odds'.

Dataframe 1:

Race Horse Bookmaker Odds
0 Bath R2 Indian Sounds BetEasy 2.65
1 Bath R2 Indian Sounds Neds 2.45
2 Bath R2 Indian Sounds Sportsbet 2.20
3 Bath R2 Hello  BetEasy 4.2
4 Bath R2 Hello Neds 4.1
5 Bath R2 Hello Sportsbet  4

Dataframe 2:

Horse AvgOdds
0 Indian Sounds 2.43
1 Hello 4.1

Code to construct dataframes:

cols1 = ['Race', 'Horse', 'Bookmaker', 'Odds']
df1 = pd.DataFrame(data=data1, columns=cols1)
cols2 = ['Race', 'Horse', 'Bookmaker', 'AvgOdds']
df2 = pd.DataFrame(data=data1, columns=cols2)
df3 = df2.groupby(by='Horse', sort=False).mean()
df3 = df3.reset_index()
df4 = round(df3,2)
df1[df1['Odds'] > df4['AvgOdds']])

When I use this code I get an error saying can only compare identically-labeled Series objects. I think this is due to the fact that it is trying to compare row 0 from dataframe 1 with row 0 from dataframe 2 and so on, which does not work as there is more rows in dataframe 1. I need it to refer to row 0-2 in dataframe 1 and row 0 in dataframe 2, then row 3-5 in dataframe 1 and row 1 in dataframe 2.

Could you make your problem as a minimal reproducible one? https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Ricky, Jun 26 '20 at 07:05

score 0 · Accepted Answer · answered Jun 26 '20 at 07:15

I have assumed your df columns look like below:

df1=pd.DataFrame({
    'Race':['Bath R2','Bath R2','Bath R2','Bath R2','Bath R2','Bath R2'],
    'Horse':['Indian Sounds','Indian Sounds','Indian Sounds','Hello','Hello','Hello'],
    'Bookmaker':['BetEasy','Neds','Sportsbet','BetEasy','Neds','Sportsbet'],
    'Odds':[2.65,2.45,2.20,4.2,4.1,4]
    
})

df2=pd.DataFrame({
    'Horse':['Indian Sounds','Hello'],
    'AvgOdds':[2.43,4.1]
    
})

and if you want to know the rows where the 'Odds' in data frame 1 are bigger than the 'AvgOdds' in data frame 2 you can do an inner join and filter like below:

#merge df1 and df2 based on Horse column 
result_df=pd.merge(df1,df2,on='Horse',how='inner')
#filter out the rows wher Odds are greater than AvgOdds
result_df[result_df['Odds']>result_df['AvgOdds']]

wwnde · Answer 2 · 2020-06-27T00:10:37.800

0

#df1=df1.assign(AvgOdds=df1.Horse.map(dict(zip(df2.Horse,df2.AvgOdds))))
d=dict(zip(df2.Horse,df2.AvgOdds))#dict the columns in df2
df1['AvgOdds']=df1.Horse.map(d)#using dict map the values of df2 to df1
df1[df1['Odds']-df1['AvgOdds']>0]#Boolean select where differences are greater than zero



     Race          Horse Bookmaker  Odds  AvgOdds
0  Bath R2  Indian Sounds   BetEasy  2.65     2.43
1  Bath R2  Indian Sounds      Neds  2.45     2.43
3  Bath R2          Hello   BetEasy  4.20     4.10

edited Jun 27 '20 at 00:10

answered Jun 26 '20 at 07:35

wwnde

26,119
6
18
32

Did this help? It is a faster solution – wwnde Jun 27 '20 at 05:53

Comparing two dataframes based on values within column

2 Answers2