Let's say I have a string
"Hello"
and a list
words = ['hello', 'Hallo', 'hi', 'house', 'key', 'screen', 'hallo','question', 'Hallo', 'format']
How can I find the n words
that are the closest to "Hello"
and present in the list words
?
In this case, we would have ['hello', 'hallo', 'Hallo', 'hi', 'format'...]
So the strategy is to sort the list words from the closest word to the furthest.
I thought about something like this
word = 'Hello'
for i, item in enumerate(words):
if lower(item) > lower(word):
...
but it's very slow in large lists.
UPDATE
difflib
works but it's very slow also. (words list
has 630000+ words inside (sorted and one per line)). So checking the list takes 5 to 7 seconds for every search for closest word!