I'm trying to find a string metric to find the most similar entry in my list to an arbitrary input. It looks like most common string metrics place heavy weight on extraneous characters, even if a substring matches perfectly. For example, 'Corvette, red 2013' and 'corvette' have a match store of 0.11 using difflib.get_close_matches() but 'octet rev' and 'corvette' have a match score of 0.23.
I know my list will likely have extraneous information (like 'red 2013') but I am more interested in knowing that 'corvette' is an exact match while ignoring that extraneous information. 'Octet rev' would count as a false match for my purposes.
Are there any string match metrics that weigh the match in the way that I need? Even better, is there one already implemented in a python package?