2

I would like to save a dictionary that contains both string and integer keys and multiple datatype values. For example:

dData = {
    'a': ['c','d'],
    1: [5.1, 3.1]
}

To save and load it I used json.dump and json.load, however, my integer keys are converted to strings upon loading it.

I don't want to change the way the dictionary is created, since it is incredibly convenient for me to keep those keys as integers.

How can I save the dictionary to file and then load it again while conserving type?

Roman
  • 8,826
  • 10
  • 63
  • 103
  • 1
    @glls that question explains why using json isn't a good fit for his problem, but as shown by the answers, there are other ways to meet the OP's question ("How can I save the dictionary to file and load it again...") without having to use json – Foon May 27 '16 at 10:44

2 Answers2

3

I suggest using the shelve module for that.

Shelve allows you to store a dictionary of arbitrary python objects inside an on-disk file.

An example from their docs:

with shelve.open('spam') as db:
    db['eggs'] = 'eggs'

It does not answer your case of integer keys, for which you can either create a subclass of shelve that will convert ints to strings or use pickle altogether.

Here is a subclass example:

from shelve import DbfilenameShelf

class IntShelf(DbfilenameShelf):

    def __getitem__(self, key):
        # Not isinstance as we wish to be specific
        if type(key) == int:
            key = "i" + str(key)
        elif type(key) == str:
            key = "s" + key
        else:
            raise TypeError

        return super().__getitem__(key)

    def __setitem__(self, key, value):
        if type(key) == int:
            key = "i" + str(key)
        elif type(key) == str:
            key = "s" + key
        else:
            raise TypeError

        return super().__setitem__(key, value)

Usage:

>>> db = IntShelf("testdb")
>>> db["123"] = "foo"
>>> db[123] = ["bar", "bar", "bar"]
>>> db["123"]
'foo'
>>> db[123]
['bar', 'bar', 'bar']

Keep in mind that using pickle instead of shelve to store the dictionary has numerous drawbacks:

  1. You need to load the entire dictionary at once, consuming a vast amount of memory in case of large datasets.
  2. Changing a single value, requires re-writing the entire dictionary.
  3. Shelve has a cleaner interface than using pickles all over the place, and has an internal cache for maximum efficiency.
  4. In case the program crashes in the middle, you will loose the DB if you didn't wrap the whole thing with a finally clause, whereas in shelve the database is saved on-demand.

Remember, disk access is one of the slowest parts of the program so you wish to minimize it.

Bharel
  • 23,672
  • 5
  • 40
  • 80
  • shelve is a just a Pickle wrapper. This does not answer the JSON question IMHO? – Philippe Ombredanne May 27 '16 at 10:15
  • 1
    @PhilippeOmbredanne He did not look for a JSON-specific way. He looked for a way to store a dictionary on disk with the added integer keys. The OP thought JSON might help but I believe shelve is a better, more general, and more efficient solution. – Bharel May 27 '16 at 10:29
  • good point! And for a pure Python to Python this works fine – Philippe Ombredanne May 27 '16 at 18:36
0

Thanks for the pointers to the other question etc, though nowhere is the exceedingly simple answer of use picke (Why is that; am I missing a detail?).

sTestDataPath = "/path/to/data/test_data.p"
import pickle


with open(sTestDataPath, 'w') as f:
    pickle.dump(dInputData, f)

with open(sTestDataPath, 'r') as f:
    dInputData = pickle.load(f)
Roman
  • 8,826
  • 10
  • 63
  • 103
  • I wrote that you can use pickle on my answer but it has a few faults, major one is that you need to load the entire dictionary at once, consuming a vast amount of memory in case of large datasets. A second major fault is that if you wish to change a single value, you re-write the entire dictionary. Remember - disk access is one of the slowest parts of a program so you want it to be efficient. If pickle would be the better choice for disk-based dictionaries, shelve wouldn't exist in the standard library :-) – Bharel May 27 '16 at 10:36
  • @Bharel I see, thanks for the explanation. Efficiency is not a major necessity in my specific case. But now I have your answer for future reference :) – Roman May 27 '16 at 10:43
  • Sure. It's also safer due to saving on-demand but as you seek. Good luck! :-) – Bharel May 27 '16 at 10:45