0

I am searching for an object storage in python that allows me to store a dictionary having tuples as keys. I already tried shelve and shove, which both exit with an error as soon as I pass my dictionary. Are there any solutions out that provide this?

For shove,

from shove import Shove
data = Shove('file://tmp')
("a",) in data

it gives me AttributeError: 'tuple' object has no attribute 'rstrip'. But only, if the tuple is not in data.

from shove import Shove
data = Shove('file://tmp')
data[("a",)] = 2
("a",) in data

would not throw an error.

For shelve,

import shelve
d = shelve.open('tmp/test.db')
d[('a',)] = 2

gives me TypeError: dbm mappings have string indices only

Milla Well
  • 3,193
  • 3
  • 35
  • 50
  • have you tried pickle? – Padraic Cunningham Jul 22 '15 at 14:26
  • @PadraicCunningham I would love to have the simplicity of accessing data just by `data[tuple(key1,key2)] = 3` as it is provided by shelve and shove. I thought with pickle, I can only load or save the whole dictionary – Milla Well Jul 22 '15 at 14:29
  • 1
    you could build a proxy class that just json-encodes the input (so it's a string) and passes that on to shelve/shove... then they would be guaranteed to be strings... assuming the tuples are simple (i.e. only composed of data that is json-encodable, not complex classes)... – Corley Brigman Jul 22 '15 at 14:47
  • @CorleyBrigman beat me to it. Are your `tuple`s convertible to a unique string representation? e.g., `tuple`s of `int`s should(?) have consistent and unique string representations, and the original `tuple` of `ints` can be retrieved with, e.g., `tuple(map(int,"(1,2,3)"[1:-1].split(',')))`. – hBy2Py Jul 22 '15 at 14:49
  • @CorleyBrigman as my examples allude, I have tuples of strings, which make a conversion to a combined string error-prone - with correct escaping this *would be* a working solution, but is it a good one? – Milla Well Jul 22 '15 at 15:03
  • It's not error-prone... i posted an answer that demonstrates this, every tuple that is different should have a different (backwards-compatible) encoding... but it does involve processing on every lookup. I don't see any way around that really though (somewhere). I realized that you may get better performance with `repr` vs `json.dumps` though...initial tests show about an order of magnitude improvement there. i'll propose a solution below... – Corley Brigman Jul 22 '15 at 16:53
  • actually, what i was going to post is a simpler version of what you've already accepted... but in general, every picked tuple should be unique. – Corley Brigman Jul 22 '15 at 17:11
  • btw (way too many comments, sorry!) - repr is about 3x faster than pickle: `json.dumps((1,0))` = ~4.37uS, `pickle.dumps((1,0))` = ~1.42uS, `repr((1,0))` = ~511nS on my machine. – Corley Brigman Jul 22 '15 at 17:12
  • @CorleyBrigman what would you use as inverse of `repr`? wouldn't `eval` be a loss of generality? – Milla Well Jul 24 '15 at 09:24
  • it's not a loss of generality... `('1,0', '0')` is different from `('1', '0,0')`, and the `repr` versions are also different (they're exactly what i wrote above). that said. if you need to iterate keys a lot (which requires reversing them), `ast.literal_eval` (what i would use) is very slow (~15 uS, vs. ~1.5uS for cPickle). if you're only doing lookups, then you don't need to reverse. But I suppose pickle is far more general, and these times are probably going to be dwarfed by DB lookup for most cases (when it's not already in memory) anyways... – Corley Brigman Jul 24 '15 at 14:32
  • btw, a solution like @Brian's above is way faster than even cPickle (i used `tuple(int(x) for x in y[1:-1].split(','))` before i saw his, which is very close, and took 300nS). but i don't think a simple solution like that exists for arbitrary strings. – Corley Brigman Jul 24 '15 at 14:34
  • @CorleyBrigman Seems like the primary constraint for using tuples of strings as keys would be that the strings must be guaranteed to never contain a comma. `str` rep of a `tuple` should always have enclosing parentheses (`key[1:-1]` seems like it should be universally reliable), and then a `split` on `,` should recover the tuple. *{tests it}* Doesn't work, actually -- `str({tuple of strings})` explicitly stores the quotes around each tuple element. Would take more finagling. – hBy2Py Jul 24 '15 at 14:46
  • something like `[x[1:-1] for x in xs[1:-1].split(',')]` should work in that case. Not allowing commas in the strings might not be an onerous requirement - they could be identifiers, for instance - but it's certainly not general. (literal_eval or pickle both handle this already, of course, by either doing full syntax decoding (literal_eval) or storing an intermediate representation (pickle). – Corley Brigman Jul 24 '15 at 17:57

2 Answers2

1

shelve is a module from Python Standard Library. The doc is clear about that : the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the pickle module can handle ... The keys are ordinary strings

By construction shelve will only accept strings as keys.

Shove is still in Beta according to the documentation from pypi, and I could not see any evidence that it supports anything other that a string for the key (the error object has no attribute 'rstrip' let think it does not).

If I were you, I would stick to the well known shelve, and just wrap it with a key serialisation layer. As suggested by Padraic Cunningham, pickle should do the job.

Here is a (not extensively tested) possible implementation :

class tuple_dict(collections.MutableMapping):
    class iterator(collections.Iterator):
        def __init__(self, d):
            self.it = d.udict.__iter__()
        def __iter__(self):
            return self
        def next(self):
            return pickle.loads(next(self.it))
    def __init__(self, udict):
        self.udict = udict
    def __getitem__(self, key):
        ukey = pickle.dumps(key)
        return self.udict[ukey]
    def __setitem__(self, key, value):
        ukey = pickle.dumps(key)
        self.udict[ukey] = value
    def __delitem__(self, key):
        ukey = pickle.dumps(key)
        del self.udict[ukey]
    def keys(self):
        return [ pickle.loads(key) for key in self.udict.keys() ]
    def __iter__(self):
        return self.iterator(self)
    def __len__(self):
        return len(self.udict)
    def __contains__(self, key):
        return pickle.dumps(key) in self.udict
    def sync(self):
        self.udict.sync()
    def close(self):
        self.udict.close()

You would use it that way :

import shelve
underlying_d = shelve.open('tmp/test.db')
d = tuple_dict(underlying_d)

d will then accept tuple as keys and stores that all in the underlying shelf.

NB : if you later want to use a different persistence implementation, provided the implementation is a mapping (dict like class), you could reuse the tuple_dict by simply changing the close and sync methods (shelve specifice) but what would be needed by the other implentation. In fact apart from these 2 methods tuple_dict just wraps an ordinary dict - and as such any mapping class ...

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

Don't know how pythonic this is but... how about defining a constant separator string as something almost impossible to occur in your key strings:

sep = '#!#!#!#'

and then, when you need to create a key for shelve out of a tuple of strings, just .join them into a crude hash:

import shelve
d = shelve.open('tmp/test.db')
d[sep.join(('a',))] = 2

If you should need to regenerate a tuple key from information contained within the shelve repository, it's as easy as a .split:

my_dict = { tuple(k.split(sep)): d[k] for k in d.keys() }

Per here, this direct dict comprehension syntax is only supported for Python 2.7 and newer, but there are alternatives for 2.6 and earlier.

In your case, since you already have a dictionary defined, you'd have to do some dict-fu to hot-swap your current tuple keys for the str-ified hash when interacting with the shelve repository, but this shouldn't be too hard.

This approach isn't completely bug-free, but arguably can be made such that the probability of problems arising from collisions of sep with your tuple-of-str keys is vanishingly small. Also, note that this approach will only work if your keys are strictly tuples of strs.

Community
  • 1
  • 1
hBy2Py
  • 1,707
  • 19
  • 29