I'm trying to get all the words made from the letters, 'crbtfopkgevyqdzsh' from a file called web2.txt. The posted cell below follows a block of code which improperly returned the whole run up to a full word e.g. for the word shocked it would return s, sh, sho, shoc, shock, shocke, shocked
So I tried a trie (know pun intended).
web2.txt is 2.5 MB in size, and contains 2,493,838 words of varying length. The trie in the cell below is breaking my Google Colab notebook. I even upgraded to Google Colab Pro, and then to Google Colab Pro+ to try and accommodate the block of code, but it's still too much. Any more efficient ideas besides trie to get the same result?
# Find the words3 word list here: svnweb.freebsd.org/base/head/share/dict/web2?view=co
trie = {}
with open('/content/web2.txt') as words3:
for word in words3:
cur = trie
for l in word:
cur = cur.setdefault(l, {})
cur['word'] = True # defined if this node indicates a complete word
def findWords(word, trie = trie, cur = '', words3 = []):
for i, letter in enumerate(word):
if letter in trie:
if 'word' in trie[letter]:
words3.append(cur)
findWords(word, trie[letter], cur+letter, words3 )
# first example: findWords(word[:i] + word[i+1:], trie[letter], cur+letter, word_list )
return [word for word in words3 if word in words3]
words3 = findWords("crbtfopkgevyqdzsh")
I'm using Pyhton3