1

I am trying to calculate the similarity between cities in my dataframe, and 1 static city name. (eventually I want to iterate through a dataframe and choose the best matching city name from that data frame, but I am testing my code on this simplified scenario). I am using fuzzywuzzy token set ratio. For some reason it calculates the first row correctly, and it seems it assigns the same value for all rows.

code:

from fuzzywuzzy import fuzz

test_df= pd.DataFrame( {"City" : ["Amsterdam","Amsterdam","Rotterdam","Zurich","Vienna","Prague"]})


test_df = test_df.assign(Score = lambda d: fuzz.token_set_ratio("amsterdam",test_df["City"]))



print (test_df.shape)
 
test_df.head()

Result:

        City  Score
0  Amsterdam    100
1  Amsterdam    100
2  Rotterdam    100
3     Zurich    100
4     Vienna    100

If I do the comparison one by one it works:

print (fuzz.token_set_ratio("amsterdam","Amsterdam"))
print (fuzz.token_set_ratio("amsterdam","Rotterdam"))
print (fuzz.token_set_ratio("amsterdam","Zurich"))
print (fuzz.token_set_ratio("amsterdam","Vienna"))

Results:

100
67
13
13

Thank you in advance!

  • https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Paul H Oct 19 '21 at 18:20
  • Please do not share picures of code . Why? https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question – user12256545 Oct 19 '21 at 19:15

1 Answers1

2

I managed to solve it via iterating through the rows:

for index,row in test_df.iterrows():
    test_df.loc[index, "Score"] =  fuzz.token_set_ratio("amsterdam",test_df.loc[index,"City"])

The result is:

        City Country Code  Score
0  Amsterdam           NL    100
1  Amsterdam           NL    100
2  Rotterdam           NL     67
3     Zurich           NL     13
4     Vienna           NL     13