Consider:
string = 'pizza'
matchings = ['pizzas', 'potato chips', 'cheesy lime', 'pretzels', 'pork']
I am trying to find a good way to find the best match in the list. which I am calculating with:
matchings_indices = {matching:sum([s == m for s,sdx in enumerate(string)\
for m, mdx in enumerate(matching) if sdx<=mdx])/len(string)
for matching in matchings}
matchings_indices
Which results in:
{'pizzas': 1.0,
'potato chips': 0.6,
'cheesy lime': 0.2,
'pretzels': 0.6,
'pork': 0.4}
Simple but good enough! I can pluck out the maximum value and that will be the match (I only need one matching value, calculated scores for clarity). But it really struggles when strings very similar appear in the list:
string = 'pizza'
matchings = ['pizzas', 'pizza fries', 'cheesy lime', 'pizzo', 'pizza']
Now my output becomes:
{'pizzas': 1.0,
'pizza fries': 1.0,
'cheesy lime': 0.2,
'pizzo': 1.0,
'pizza': 1.0}
Of course here pizza should have maximum index. I tried sorting them as well like:
matchings_indices = {matching:sum([s == m for s,sdx in enumerate(sorted(string))\
for moose in matching.split()
for m, mdx in enumerate(sorted(moose)) if sdx==mdx])/len(string)
for matching in matchings}
But in that case this is the output for first case: (Still good enough for very dissimilar strings)
{'pizzas': 0.8,
'potato chips': 0.0,
'cheesy lime': 0.0,
'pretzels': 0.0,
'pork': 0.2}
and here for second:
{'pizzas': 0.8,
'pizza fries': 1.0,
'cheesy lime': 0.2,
'pizzo': 0.6,
'pizza': 1.0}
Which is better but still. pizzas
is a better match than pizza fries
and should be scored higher.
So any help bettering the situation will be great!