3

I have a (fixed) set of keys for which I store a value. I often look up the value for a key and increment or decrement it. A typical dict usage.

x = {'a': 1, 'b': 4, 'c': 3}
x['a'] += 1

Additionally however, just as often as incrementing or decrementing values, I also need to know the key for the i-th largest (or smallest) value. Of course I can do the sorting:

s = sorted(x, key=lambda k:(x[k],k))
s[1] == 'c'

The problem is sorting every time seems rather expensive. Especially because I only increment one item in between sorts. I feel that I could use another data structure better suited for this. A tree perhaps?

Paul
  • 766
  • 9
  • 28
  • I don't know of a structure that would be more efficient on the back end. If you kept a sorted list of keys, then you might save a few clock cycles there. You'd only have to sort it on insert, instead of every time like you are effectively doing now. You just say `s[keyList[i]]` to get your data out. – Hoopdady Oct 17 '13 at 15:01
  • https://stackoverflow.com/questions/2298165/pythons-standard-library-is-there-a-module-for-balanced-binary-tree – Ciro Santilli OurBigBook.com May 23 '17 at 08:48

5 Answers5

2

You could use blist's sorteddict to keep the values in order. Here's a quick implementation of a dictionary which, when iterated over, returns its keys in order of its values (not really tested intensively):

import collections
from blist import sorteddict

class ValueSortedDict(collections.MutableMapping):
    def __init__(self, data):
        self._dict = {}
        self._sorted = sorteddict()
        self.update(data)

    def __getitem__(self, key):
        return self._dict[key]

    def __setitem__(self, key, value):
        # remove old value from sorted dictionary
        if key in self._dict:
            self.__delitem__(key)
        # update structure with new value
        self._dict[key] = value
        try:
            keys = self._sorted[value]
        except KeyError:
            self._sorted[value] = set([key])
        else:
            keys.add(key)            

    def __delitem__(self, key):
        value = self._dict.pop(key)
        keys = self._sorted[value]
        keys.remove(key)
        if not keys:
            del self._sorted[value]

    def __iter__(self):
        for value, keys in self._sorted.items():
            for key in keys:
                yield key

    def __len__(self):
        return len(self._dict)

x = ValueSortedDict(dict(a=1, b=4, c=3))
x['a'] += 1
print list(x.items())
x['a'] += 10
print list(x.items())
x['d'] = 4
print list(x.items())

This gives:

[('a', 2), ('c', 3), ('b', 4)]
[('c', 3), ('b', 4), ('a', 12)]
[('c', 3), ('b', 4), ('d', 4), ('a', 12)]
  • This looks awesome. If I get it right, __setitem__ takes roughly O(log(n)). Finding the i-th key based on sorted values only needs to iterate over the sorteddict which seems much quicker than the O(n log(n)) complexity of a full sort. Thanks! – Paul Oct 18 '13 at 10:50
  • You're welcome. Meanwhile I realized that an obvious further optimization, for your rather special use case, is possible. Instead of deleting an item and then reinserting the update, which requires a fresh search over the btree, you actually know in advance that an item in self._sorted will shift by at most one location (or remain where it is). You could optimize for this by rolling your own special cased sorteddict whose keys are integers that are only incremented and decremented, saving you one search per update. – Matthias C. M. Troffaes Oct 18 '13 at 12:52
  • Maybe I'm being dense, but why do you need to maintain a separate Python `dict` in addition to the `blist.sorteddict`? Doesn't the `sorteddict` already handle everything? – John Y Oct 18 '13 at 14:44
  • @JohnY I think that time complexity is different, in particular assignment, and built-in `dict` might be more efficient. @MatthiasC.M.Troffaes yes that would probably speed it up, at the cost of loss of generality. It occurred to me that what I am looking for is quite similar to a database table with two views -- one sorted on the first column, and the other sorted on the second -- except maybe that I want rows by their index in the second view. Does that make such a big difference? – Paul Oct 20 '13 at 00:16
  • @Paul: Assigning to `dict` is more efficient than assigning to `sorteddict`, but how will you save time if you have to assign to **both** a `dict` **and** a `sorteddict`? The whole point of `sorteddict` is to do everything that you've asked for in your question. – John Y Oct 20 '13 at 01:55
  • @JohnY The `dict` is used to look up the values by key. The `sorteddict` on the other hand is used to store the keys by value. When updating the dictionary, the above implementation first needs the value that goes with the specified key: it uses a `dict` to do this quickly (no need for sorting here). When it has this value, it can efficiently update the `sorteddict`. – Matthias C. M. Troffaes Oct 29 '13 at 13:55
  • @Paul Indeed this is like a table with multiple views. To solve your problem in C++, you could probably use boost's [MultiIndex](http://www.boost.org/doc/libs/1_54_0/libs/multi_index/doc/index.html) and in fact the MultiIndex documentation explicitly mentions relational databases. If I understand correctly your second question about wanting the rows by index in the second view, I guess you'd want something like a sortedlist for that. – Matthias C. M. Troffaes Oct 29 '13 at 14:03
  • @MatthiasC.M.Troffaes: I think you and Paul are both completely misunderstanding `blist.sorteddict`. It already has O(1) average lookup. The whole point of it is to keep the `dict` lookup behavior, but to take care of maintaining the sorted keys for you. If you look at the source, you'll see it already has a `self._map` and a `self._sortedkeys`. Now you are proposing that **in addition** to both of those data structures built into `sorteddict`, you **also** want to maintain your own `dict`?! – John Y Oct 29 '13 at 19:53
  • @JohnY Yes. The sorteddict in the above is not used to sort the keys of dict; it's used to sort the values of dict. – Matthias C. M. Troffaes Nov 03 '13 at 15:02
1

You can use OrderDict from collections. Though it is unavailable in old python versions.

from collections import OrderedDict

If you have django installed you can use django.utils.datastructures.SortedDict

Nik
  • 420
  • 4
  • 16
  • `OrderedDict` only maintains insertion order, so would not be a good fit for OP's case (he needs sorting by value). – John Y Oct 17 '13 at 16:30
  • There is no problem on sorting any way. Important, that ordering is saved. – Nik Oct 18 '13 at 09:54
  • Did you read the question? He needs to be able to find the i-th largest value. To do that with an OrderedDict he would need to either search through all the values, or he would have to sort by value first, which is what he is trying to avoid. – John Y Oct 18 '13 at 14:58
0

Use Operator:

import operator

max(x.iteritems(), key=operator.itemgetter(1))[0]

From Docs:

operator.itemgetter(*items)

Return a callable object that fetches item from its operand using the operand’s getitem() method. If multiple items are specified, returns a tuple of lookup values. For example:

I don't knw if it's the best solution but it works.

Kobi K
  • 7,743
  • 6
  • 42
  • 86
0

why not use a Counter from collections? Then you can use Counter.most_common() to get the sorted list.

>>> from collections import Counter
>>> x = Counter({'a': 1, 'b': 4, 'c': 3})
>>> x['a'] += 1
>>> x.most_common()
[('b', 4), ('c', 3), ('a', 2)]
grim
  • 760
  • 4
  • 13
  • 1
    This would be exactly what I need. Reading the source code however, `most_common` just calls `sorted`, which defeats the whole purpose. :( – Paul Oct 18 '13 at 10:54
0

I think most of the python structures are going to do something similar to what you have done in your example. The only thing I can think of to make it a little more efficient is to keep a sorted list of your keys. That way you only have to sort each time you insert. In your method you have to sort each time you want to access a value by index. Here is an example:

x = {'a': 1, 'b': 4, 'c': 3}
x['a'] += 1

keyList = sorted(x.keys())

print x[keyList[1]]
4

x['e'] = 7
x['j'] = 11
x['d'] = 6
x['h'] = 8

keyList = sorted(x.keys())

print x[keyList[3]]
6
print x[keyList[4]]
7

Hope that helps.

Hoopdady
  • 2,296
  • 3
  • 25
  • 40