Given word and a vocabulary I would like to find all entries in the vocabulary with the longest common prefix.
Here are some examples:
> vocabulary = {"english", "englishx", "english-indian", "enya", "spanish"}
> prefix, words = find_by_longest_prefix(vocabulary, "englishy")
english: ['englishx', 'english', 'english-indian']
> prefix, words = find_by_longest_prefix(vocabulary, "entomology")
en: ['enya', 'englishx', 'english', 'english-indian']
> prefix, words = find_by_longest_prefix(vocabulary, "spania
spani: ['spanish']
Imagine now you have to call this method multiple times for different input words but same vocabulary. My naive implementation finds matching words in O(n) time (n is lenght of input word), but takes a lot of memory: O(n*m), where n is the size of vocabulary and m is the length of the longest word in the vocabulary.
from collections import defaultdict
def build_words_by_prefix(vocabulary):
words_by_prefix = defaultdict(list)
for word in vocabulary:
for i in range(1, len(word) + 1):
prefix = word[:i]
words_by_prefix[prefix].append(word)
return words_by_prefix
def find_by_longest_prefix(vocabulary, word):
words_by_prefix = build_words_by_prefix(vocabulary)
for i in range(len(word)+1, 1, -1):
prefix = word[:i]
words = words_by_prefix.get(prefix, False)
if words:
return prefix, words
return False
I'm looking for a solution that is both memory and time efficient. I've heard about tries for more efficient storing of prefixes but I wonder how can I use them here?