I am looking for a way to find the closest string match between two strings that could eventually have a very different size. Let's say I have, on the one hand, a list of possible locations like:
Yosemite National Park
Yosemite Valley
Yosemite National Park Lodge
Yosemite National Park Visitor Center
San Francisco
Golden Gate Park San Francisco
Paris
New York
Manhattan New York
Hong Kong
On the other hand, I have multiple sentences like:
- "I proposed to my wife on the 12th of November 1984, during a crazy downpour in the middle of Yosemite in California"
- "I love to walk my dog in Central Park, New York"
- "I love Hong Kong"
Now say I would like to extract the location from these set of sentences I would I proceed to do that? I know about the Levenshtein distance algorithm but I'm not quite sure it will work efficiently here, especially because I have many more locations and many more sentences to try and match. I guess what I would love to have is a matching score of some sort for each location so that I can pick the one with the highest score, but I have no idea on how to compute this score.
Do you guys have any idea of how to do that? Or perhaps even an implementation or python package?
Thanks in advance