I have a dictionary in some code which maps a key to a word, the key is the result of an md5
hash. I have code that essentially wants to get the key
for a word
, and when it doesn't already exist, add it to the dictionary
Here was my first implementation:
key = int(hashlib.md5(word).hexdigest(), 16)
if key in self.id_to_word.keys():
assert word == self.id_to_word[key]
else:
self.id_to_word[key] = word
return key
After profiling my code I found this to be EXTREMELY slow. So then I tried this, which is functionally equivalent
key = int(hashlib.md5(word).hexdigest(), 16)
try:
assert word == self.id_to_word[key]
return key
except KeyError:
self.id_to_word[key] = word
This turned out to be incredibly faster. While I'm certainly happy about the performance improvement, I was wondering if someone could explain to me why. Is it bad practice to check for something in a keys()
function from a dictionary like that? Is it generating copies of that every time (wasting a lot of computation)?