The spaCy similarity works strange sometimes. If we compare the completely equal texts, we got a score of 1.0. but the texts are almost equal we can get a score > 1. This behavior could harm our code. Why we got this > 1.0 score and can we predict it?
def calc_score(text_source, text_target):
return nlp(text_source).similarity(nlp(text_target))
# nlp = spacy.load('en_core_web_md')
calc_score('software development', 'Software development')
# 1.0000000155153665