I am trying to create an odds matcher using python that compares game names using pandas. The problem I am having is if the data is not a 100% match, then it will not recognise the game name.
Is there an efficient way to match game names? E.g a percentage match. Fuzzy lookup? I cannot think of a reliable way to do this as to minimise errors. Any ideas how this might be achieved through Python?
a b c d e
EC Bahia v Salvador U20 2.3 EC Bahia v Salvador 2.3 NaN
EC Bahia v Salvador 2.3 EC Bahia v Salvador 2.3 Match You could get the first word before v and after but….
Bahai Samone v Salvator 2.3 EC Bahia v Salvador 2.3 Match However this causes problem when the string was Ec FAHI (different)
df1
EW WE DA \
0 k k 2
1 EC Bahia Salvador U20 Clube Atletico Mineiro U20 2.3
2 Moreirense Rio Ave 1.62
3 EC Bahia Salvadoa U20 14
4 EC Bahia Salvador 4141
DD
0 https://www.b1
1 https://www.b1
2 https://www.b1
3 https://www.b1
4 https://www.b1
df2
AA AB AC AD \
0 Starting soon k k 3.15
1 In-Play FC Nitra U19 Z Michalovce U19 9.60
2 In-Play Sevilla U19 NK Maribor U19 NaN
3 In-Play Moreirense Rio Av 1.02
4 Starting in 13' EC Bahia Salvador 1.07
AE
0 https://www.be
1 https://www.be
2 https://www.be
3 https://www.be
4 https://www.be
Desired:
AA AB AC AD \
0 Starting soon k k 3.15
1 Starting in 13' EC Bahia 9.60
1 In-Play Moreirense Rio Av 1.02
AE EW WE \
0 https://www.b1 k v k 2
1 https://www.b2 EC v Bahia 4141
3 https://www.b3 Moreirense v Rio Av 1.02
Formula:
df1['EW'] = df1['EW'] + ' v ' + df1['EW']
df1['WE'] = df1['DA']
df1['DA'] = df1['DD']
df2['EW'] = df2['AB'] + ' v ' + df2['AC']
print('kk')
df3 = pd.merge(df2, df1, on='EW')