I need to be able to quickly check if a given word is in my dictionary (English wordlist). I'm only concerned with speed of checking membership (not adding or removing elements), and memory use isn't really a problem.
Originally I was using a set like this:
words = set(x.strip().lower() for x in open("/usr/share/dict/words").readlines())
if(word in words):
...
My program took approx. 4s to run on a test input. I then tried to optimise things by using a DAWG (http://pypi.python.org/pypi/pyDAWG) instead by precomputing the DAWG and pickling it:
words = pickle.load(open('wordlistDAWG.pyd'))
if(words.word2index(word) is not None):
...
On the same test input, the program then took around 40s to run (including a few seconds to load the DAWG which I don't care about). I was hoping using the DAWG would make things run quicker!
Maybe I'm missing some understanding about how python hashes things - is a set already the best I'm going to get (O(1) membership test?) rather than a DAWG or a Trie? Will a DAWG just save on memory but not on computation?
Many Thanks!