I have a dict which has 20,000 keys and total size of dict is 150MB. I dump the dict via pickle to disk every hour and then load the pickle file on program startup. Here is the gist of the writing code
cache_copy = copy.deepcopy(self.cache)
#self.cache is the dict
pickle.dump(cache_copy, cache_file, pickle.HIGHEST_PROTOCOL)
Sometimes I get the following error
cache_copy = copy.deepcopy(self.cache)
File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.7/copy.py", line 256, in _deepcopy_dict
for key, value in x.iteritems():
RuntimeError: dictionary changed size during iteration
What is the best way to do this? I want to really avoid thread locking as it makes code complex. If it is really necessary, locking should be as minimal/simple as possible and allow for some concurrency. There are several constraints in my code that could help in this direction:
- Multiples threads read and write to the dict(). However, in all writes only (key,value) pairs are added. (key, value) pairs are never deleted or modified
- I am open to changing the data structure from dict() to something else. It should have functionality of fast memory lookup and writes
- I don't mind stale writes. So if the dict() had some appends, and we write to a dict snapshot that is few seconds older that is ok.