Just trying to play with the ngram
library of the Python and I came across an issue which is related to the similarity of the string. The ratio output was a bit confusing. See what I tried:
>>> ngram.NGram.compare('alexp','Alex Cho',N=1)*100
30.0
>>>
>>> ngram.NGram.compare('alexp','Alex Plutzer',N=1)*100
21.428571428571427
>>> ngram.NGram.compare('alexp','Alex Plutzer'.lower(),N=1)*100
41.66666666666667
>>> ngram.NGram.compare('alexp','Alex Cho'.lower(),N=1)*100
44.44444444444444
>>> ngram.NGram.compare('alexp','AlexCho'.lower(),N=1)*100
50.0
>>> ngram.NGram.compare('alexp','AlexPlutzer'.lower(),N=1)*100
45.45454545454545
The most similar must be the one having alexp
i.e. Alex Plutzer
but the more score is getting assigned to the former one i.e. Alex Cho
What might be done to get an appropriate result, where I get to have the output as Alex Plutzer
with high score as compare to the competitive one?