-1

P/S: The approach in best way to extract subset of key-value pairs from python dictionary object is recreating a new subkeys' dictionary. It is slow (I have tried it). The answer of using Subdicview given by shx2 below is great in terms of efficiency.

I have a python dictionary, e.g.,

d={"a1":Obj1, "a2":Obj2,"a3":Obj3,...,"a10":Obj10}

Where Obj1 to Objn is some objects of self created python class.

The problem is that, in a loop of 100 millions time, I need different subset of keys at each iteration, say I need "a1" to "a3", what I do now is I reconstructed the dictionary

d1={"a1":Obj1, "a2":Obj2,"a3":Obj3}

whenever I want to use it. In the end, I do 100 millions reconstructions of dictionaries.

Is there a more efficient way to handle such case (e.g, muting the keys in d that I am not interested) without reconstructing the dictionary each time in the loop?

Community
  • 1
  • 1
william007
  • 17,375
  • 25
  • 118
  • 194
  • `in a loop of 100 millions time, I need different subset of keys at each iteration` - Can you please explain? – thefourtheye Oct 18 '14 at 18:26
  • 1
    Do you have a data structure holding all the subsets for each iteration to start with? – shx2 Oct 18 '14 at 18:28
  • @shx2 no, which subset is depends on the input data, can assume the subsets are unknown in advanced. – william007 Oct 18 '14 at 18:30
  • @william007 Do you have a problem accessing the items with keys? – thefourtheye Oct 18 '14 at 18:32
  • @thefourtheye no problems, but say subset is `"a1"`,`"a2"`,`"a3"`, using `d` without reconstructing a new dictionary, `"a10" in d` will return true, which is not what I want. – william007 Oct 18 '14 at 18:35
  • I still don't really understand your question. How do you intend to "get" the keys you want without creating some kind of new data structure to hold them? – BrenBarn Oct 18 '14 at 18:46
  • @BrenBarn I am thinking in the direction that by using the dictionary `d`, I can somehow "mute" the keys I am not interested, but I am opened to any possibilities. – william007 Oct 18 '14 at 18:49
  • @william007: Can you give an example of what you actually need to do with the result? – BrenBarn Oct 18 '14 at 18:53
  • possible duplicate of [best way to extract subset of key-value pairs from python dictionary object](http://stackoverflow.com/questions/5352546/best-way-to-extract-subset-of-key-value-pairs-from-python-dictionary-object) – three_pineapples Oct 19 '14 at 04:21

4 Answers4

1

You can use the following "lightweight" sub-dict-view class. This is possibly the fastest approach, as it avoids creating new dicts at each iteration (creating the view object is fast).

from UserDict import DictMixin

class SubDictView(DictMixin):

    def __init__(self, dct, keys):
        self._dct = dct
        self._keys = keys

    def __getitem__(self, key):
        if key not in self._keys:
            raise KeyError(key)
        return self._dct[key]

    def keys(self):
        return set(self._dct) & self._keys

    def __setitem__(self, key, val):
        raise RuntimeError('SubDictView is read-only')

    def __delitem__(self, key):
        raise RuntimeError('SubDictView is read-only')

d = {'a': 1, 'b': 2, 'c': 3}
dv = SubDictView(d, {'b', 'c'})
print dv
# {'c': 3, 'b': 2}
print 'a' in dv
# False
print dv['b']
# 2
print dv.get('a', 999)
# 999

If you already have the subset of keys stored as a set, you can gain further speed by avoiding the conversion-to-set in __init__.

william007
  • 17,375
  • 25
  • 118
  • 194
shx2
  • 61,779
  • 13
  • 130
  • 153
  • 1
    Thanks, this is in line of what I am thinking of – william007 Oct 18 '14 at 18:55
  • @shx2/@william007, So, what you have achieved here is, instead of doing 100 millions reconstructions of dictionaries, you are now doing 100 millions constructions of sets. That won't improve the performance. – Debanshu Kundu Oct 18 '14 at 19:26
  • @DebanshuKundu if the keys-subsets are already stored as sets, or set-like (i.e. fast lookup), the conversion-to-set can be skipped, which would make it very fast. Also, if the keys-subsets are guaranteed to be contained in the full set of keys, the implementation of `keys()` can be changed to `return self._keys`, which would speed things up further. – shx2 Oct 18 '14 at 19:30
  • 1
    @shx2, Yes, but then you can also create sub-dictionaries of the original dictionary and use them instead or using the original dictionary. Either way one would have to create 100 millions hashes either inside the loop or before the loop starts. – Debanshu Kundu Oct 18 '14 at 19:39
  • @DebanshuKundu under these conditions, creating N views (and accessing them) is much faster than creating N dicts (and accessing them). – shx2 Oct 18 '14 at 19:44
  • 1
    @shx2 Thanks, I have tested this method this method is indeed faster than dictionary recreation. I change your answer from list to set, as it would be way more faster without conversion. Even I pass in a set, with set(keys) in SubDicView, it runs slower. – william007 Oct 19 '14 at 03:43
  • 1
    @DebanshuKundu it indeed runs faster! Even with list conversion it also runs faster, but definitely without conversion is much more faster. – william007 Oct 19 '14 at 03:49
0

One (speed-oriented) solution would be to use a pandas.Series.

import pandas as pd
series = pd.Series(d.values(), index = d.keys())
subseries1 = series[:3]
subseries2 = series[10:20]
...

If you need a subset of keys which are not consecutive indices, you can still use something like:

subseries3 = series.ix[[1,3,8]]

though in this case it might prove a bit slower, because this kind of indexing (as opposed to slicing) causes a new series to be created (instead of a view to the original series, which is much faster).

pandas.Series has an interface which resembles dicts in some ways, so you wouldn't need to changes the rest of your code by much (or at all).

shx2
  • 61,779
  • 13
  • 130
  • 153
0

Your question is far from clear, but if I understand correctly, you can use operator.itemgetter. If your dictionary is like this:

d = {'a1': 1, 'a2': 2, 'a3': 3, 'a4': 4, 'a5': 5}

Then:

>>> operator.itemgetter('a1', 'a3', 'a5')(d)
(1, 3, 5)
BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • Thanks, but not the question I am asking, so sorry for that. What I need is a dictionary `d` where `"a1" in d` return true, and `"a2" in d` return false, `d["a1"]` will return 1, `d["a2"]` will throw exception, if the subset is `"a1","a3","a5"` – william007 Oct 18 '14 at 18:59
  • @william007: Again, can you please explain *where you are getting those keys*. What is the data structure that holds the values `"a1", "a2", "a3"` that you want to use to make your new dict? – BrenBarn Oct 18 '14 at 19:04
  • You can assume they are given by the users, you can assume of any data structures that are holding them. – william007 Oct 19 '14 at 00:59
0

It is crucial to note that set(dict) does not create a view in Python 2.7 (as the accepted answer implies). It creates a set, and doing it that way is actually pretty slow. Dictionary views in Python 2.7 are accessed by the view* methods on dictionaries (in this case, we want dict.viewkeys())

Ryan
  • 1