so the problem is. I have wrote a script that compare values in dataPhrame using fuzzywuzzy
def check_match_principal_name(state):
for i in range(len(ALL_SCHOOLS['Principal Name'])):
for a in range(len(TOP100['Principal'])):
matchADD = fuzz.token_sort_ratio(ALL_SCHOOLS['Principal Name'][i], TOP100['Principal'][a])
if matchADD > 90:
print(ALL_SCHOOLS['Principal Name'][i]+' '+TOP100['Principal'][a])
matchPRI.append(i)
matchPRI100.append(a)
print(ALL_SCHOOLS['Principal Name'][i])
print(TOP100['Principal'][a])
for i in matchPRI:
ALL_SCHOOLS.loc[i, 'MatchPRI'] = 1
for i in matchPRI100:
TOP100.loc[i, 'MatchPRI'] = 1
ALL_SCHOOLS.to_excel(f'/Users/Giova/PycharmProjects/Schools/Final_final/{state}1.xlsx')
TOP100.to_excel(f'/Users/Giova/PycharmProjects/Schools/Final_final/top-100/{state}1.xlsx')
matchPRI.clear()
matchPRI100.clear()
it works, I don't have any exceptions and etc. but for example in upper script fuzz.token_sort_ratio(ALL_SCHOOLS['Principal Name'][i], TOP100['Principal'][a])
returns Kimberly Beukema - Ms. Kimberly Beukema = 91
and in second script like this:
from fuzzywuzzy import fuzz
match= fuzz.partial_token_sort_ratio('Kimberly Beukema',' Ms. Kimberly Beukema')
print(match)
it returns match = 100
and I don't understand why the value is changing?