I suggest using the shelve module for that.
Shelve allows you to store a dictionary of arbitrary python objects inside an on-disk file.
An example from their docs:
with shelve.open('spam') as db:
db['eggs'] = 'eggs'
It does not answer your case of integer keys, for which you can either create a subclass of shelve that will convert ints to strings or use pickle altogether.
Here is a subclass example:
from shelve import DbfilenameShelf
class IntShelf(DbfilenameShelf):
def __getitem__(self, key):
# Not isinstance as we wish to be specific
if type(key) == int:
key = "i" + str(key)
elif type(key) == str:
key = "s" + key
else:
raise TypeError
return super().__getitem__(key)
def __setitem__(self, key, value):
if type(key) == int:
key = "i" + str(key)
elif type(key) == str:
key = "s" + key
else:
raise TypeError
return super().__setitem__(key, value)
Usage:
>>> db = IntShelf("testdb")
>>> db["123"] = "foo"
>>> db[123] = ["bar", "bar", "bar"]
>>> db["123"]
'foo'
>>> db[123]
['bar', 'bar', 'bar']
Keep in mind that using pickle instead of shelve to store the dictionary has numerous drawbacks:
- You need to load the entire dictionary at once, consuming a vast amount of memory in case of large datasets.
- Changing a single value, requires re-writing the entire dictionary.
- Shelve has a cleaner interface than using pickles all over the place, and has an internal cache for maximum efficiency.
- In case the program crashes in the middle, you will loose the DB if you didn't wrap the whole thing with a
finally
clause, whereas in shelve the database is saved on-demand.
Remember, disk access is one of the slowest parts of the program so you wish to minimize it.