2

I've a dictionary such as this:

my_dict=collections.OrderedDict([((123, 1), 'qwe'), ((232, 1), 'asd'), ((234, 2), 'zxc'), ((6745, 2), 'aaa'), ((456, 3), 'bbb')])

The combination of the tuple is always unique and I would like to maintain the order of insertion, and hence OrderedDict. I've a well over ~10K items in the dict. How can I efficiently maintain a counter that gives the count of the second element in the tuple? Basically, I need to know the count whenever I would like to add/delete an item in the key. Right now I just iterate through my_dict and get the counter everytime but it seems to be very expensive to do that.

In the above example I want the output to be:

1:2 # As in 1 occurs 2 times 
2:2
3:1

Right now I do the following:

from collections import OrderedDict, Counter
my_dict = OrderedDict()
my_dict[(123,1)] = 'qwe'
my_dict[(232,1)] = 'asd'
my_dict[(234,2)] = 'zxc'
my_dict[(6745,2)] = 'aaa'
my_dict[(456,3)] = 'bbb'
cnt = []
for item in my_dict.keys():
    cnt.append(item[1])
print Counter(cnt)

I'm not sure if this is the best way but is there a way to override the the = operator and pop function, such that it adds or subtracts a count every time I do that operation?

0x0
  • 2,915
  • 5
  • 40
  • 67
  • You'd probably be best served by a custom class that implemented `__setitem__` and kept the `Counter` and `OrderedDict` instances as underlying attributes. – g.d.d.c Aug 28 '14 at 18:22
  • 1
    The first line has no effect. `my_dict` is assigned to an ordinary `dict` in the second line. – jfs Aug 28 '14 at 18:23
  • @J.F.Sebastian You are right. I wasn't thinking. Corrected my examples. – 0x0 Aug 28 '14 at 18:33
  • `collections.Counter(x[1] for x in my_dict.iterkeys())` will do what your for loop is doing – Padraic Cunningham Aug 28 '14 at 18:46

1 Answers1

4

Getting a Counter to work nicely with an OrderedDict is probably going to require some subclassing. Here's something that might work (I've only implemented __setitem__ and __getitem__, but if you'd like a more robust implementation, let me know):

import collections

class CountedOrderedDict(collections.OrderedDict):
    def __init__(self, *args, **kwargs):
        self.counter = collections.Counter()
        super(CountedOrderedDict, self).__init__(*args, **kwargs)

    def __delitem__(self, key):
        super(CountedOrderedDict, self).__delitem__(key)
        self.counter[key[1]] -= 1

    def __setitem__(self, key, value):
        if key not in self:
            self.counter[key[1]] += 1

        super(CountedOrderedDict, self).__setitem__(key, value)

Example usage:

>>> my_dict = CountedOrderedDict({(123,1): 'sda', (232,1) : 'bfd', (234,2) : 'csd', (6745,2) : 'ds', (456,3) : 'rd'})
>>> my_dict.counter
Counter({'1': 2, '2': 2, '3': 1})
>>> del my_dict[(123,1)]
>>> my_dict.counter
Counter({'2': 2, '1': 1, '3': 1})
>>> my_dict[(150,1)] = "asdf"
>>> my_dict.counter
Counter({'1': 2, '2': 2, '3': 1})

Here's a more general CountedOrderedDict implementation that takes a key function as a parameter.

import collections

class CountedOrderedDict(collections.OrderedDict):
    def __init__(self, key=lambda k: k, *args, **kwargs):
        self.counter = collections.Counter()
        self.key_transform = key
        super(CountedOrderedDict, self).__init__(*args, **kwargs)

    def __delitem__(self, key):
        super(CountedOrderedDict, self).__delitem__(key)
        self.counter[self.key_transform(key)] -= 1

    def __setitem__(self, key, value):
        if key not in self:
            self.counter[self.key_transform(key)] += 1

        super(CountedOrderedDict, self).__setitem__(key, value)

For your needs, you'd instantiate it like so:

my_dict = CountedOrderedDict(key=lambda k: k[1])
Dan Loewenherz
  • 10,879
  • 7
  • 50
  • 81
  • In both classes, I'd suggest that `__delitem__` should reraise the exception it catches, not suppress it. The simplest way to do that may be to just write the `super` call and the decrement, without any `try`/`except` blocks. Any exception raised in `super().__delitem__` will stop the decrement from happening! In the second class's `__init__` method, in Python 3 you'd probably want to make `key` a keyword-only argument by moving it after `*args`. That way you can actually pass positional arguments without the first needing to be `key`. I'd also suggest using a different name than `key`! – Blckknght Aug 28 '14 at 19:04
  • @Blckknght Good suggestion. I was writing a elaborate and an ugly comment to ask about the same issue with the exception. Thanks! – 0x0 Aug 28 '14 at 19:10
  • I changed the order of key in the parameters so that non keyword arguments can be supplied without providing a key, but left "key" as the name of the parameter since it's used similarly in other instances (such as in `sorted` and `max`/`min`). – Dan Loewenherz Aug 28 '14 at 19:13
  • @Dan compiling this gives me an invalid syntax error at the `def __init__(self, *args, key=lambda k: k, **kwargs)` line. I don't understand why to give a suggestion. – 0x0 Aug 28 '14 at 19:19
  • Yikes, this is what happens when I edit code without running it first. :) Fixed. – Dan Loewenherz Aug 28 '14 at 19:22
  • Do you think you know why that happens? According to @Blckknght it looks like moving the `key` after `*args` makes it a keyword-only argument(not sure how), but intuitively it shouldn't throw an error, right? Thanks so much. – 0x0 Aug 28 '14 at 19:26
  • @Sunil Good question. This syntax will work in Python 3 but won't work with Python 2.7 or below. I just changed it back to what I originally had. – Dan Loewenherz Aug 28 '14 at 19:28