Count reads from python dictionary with unpacking

Question

I am interested in counting the number of accesses to a dictionary's values. I am unsure how to include dictionary unpacking in the counter. Any tips?

from collections import defaultdict

class LDict(dict):
    def __init__(self, *args, **kwargs):
        '''
        This is a read-counting dictionary
        '''
        super().__init__(*args, **kwargs)
        self._lookup = defaultdict(lambda : 0)

    def __getitem__(self, key):
        retval = super().__getitem__(key)
        self._lookup[key] += 1
        return retval

    def __setitem__(self, key, value):
        super().__setitem__(key, value)
        self._lookup[key] = self._lookup.default_factory()

    def __delitem__(self, key):
        super().__delitem__(self, key)
        _ = self._lookup[key]
        del self._lookup[key]

    def list_unused(self):
        return [key for key in self if self._lookup[key] == 0]

l = LDict(a='apple', b='bugger')

print({**l, **l})
print(l.list_unused())
_ = l['a']
print(l.list_unused())

Pretty sure dictionary unpacking (I assume you mean using `**` as in function calls ) is going to use `.items()` to get keys/values. You'll need to provide your own implementation of this. — kindall, Nov 03 '17 at 00:41
@kindall maybe `__iter__` .... I figured this out at one point. — juanpa.arrivillaga, Nov 03 '17 at 01:04
@juanpa.arrivillaga. Neither of those methods are called when unpacking. There is a special opcode for it: `BUILD_MAP_UNPACK`. — ekhumoro, Nov 03 '17 at 01:37
Oddly, it seems that when unpacking a plain `object` subclass that defines `__getitem__` and `keys`, those methods *will* get called. But the same is not true of a `dict` subclass. So maybe python optimises when it detects a `dict` subclass, and just copies the contents directly. — ekhumoro, Nov 03 '17 at 02:18
@ekhumoro you are correct for `dict()` and subclasses of `dict()`, but in the general case Python will use the `__iter__()` method. We can use the `collections.MutableMapping` abstract base class to create a dict like object that's compatible with the dictionary unpacking syntax while avoiding this special case. — olooney, Nov 03 '17 at 02:22
@olooney. No, `__iter__` is not sufficient to define a mapping, and is not needed at all for unpacking. — ekhumoro, Nov 03 '17 at 02:26
@ekhumoro, my solution below has a fully runnable script with a print statement in `__iter__()`. You can run it yourself in Python 3.5 and verify that the dictionary unpacking on line 41 `{**l, **l}` causes `l.__iter__()` to be called twice and evidenced by the message "__iter__ is being called!" being printing to the console twice. — olooney, Nov 03 '17 at 02:34
@olooney. Sure, but that is only because it inherits `MutableMapping`, and `__iter__` is an abstract method, so it must be implemented. I only said that `__iter__` is not **sufficient** to define a mapping and is not **needed** for unpacking. A plain `object` subclass with `__getitem__` and `keys` is the bare minimum required for unpacking a mapping. — ekhumoro, Nov 03 '17 at 02:41

olooney · Accepted Answer · 2017-11-03T02:18:29.163

You need to override more methods. Access is not centralized through __getitem__(): other methods like copy(), items(), etc. access the keys without going through __getitem()__. I would assume the ** operator uses items(), but you will need to handle ALL of the methods to keep track of EVERY access. In many cases you will have to make a judgement call. For example, does __repr__() count as an access? The returned string contains every key and value formatted, so I think it does.

I would recommend overriding all of these methods, because you have to do bookkeeping on assignment too.

def __repr__(self):
def __len__(self):
def __iter__(self):
def clear(self):
def copy(self):
def has_key(self, k):
def update(self, *args, **kwargs):
def keys(self):
def values(self):
def items(self):

EDIT: So apparently there's an important caveat here that directly relates to your implementation. if LDict extends dict, then none of these methods are invoked during the dictionary unpacking { **l, **l}.

Apparently you can follow the advice here though, and implement LDict without extending dict. This worked for me:

from collections import MutableMapping

class LDict(MutableMapping):
    def __init__(self, *args, **kwargs):
        '''
        This is a read-counting dictionary
        '''
        self._lookup = defaultdict(lambda : 0)
        self.data = {}
        if kwargs:
            self.data.update(kwargs)

    def __getitem__(self, key):
        retval = self.data[key]
        self._lookup[key] += 1
        return retval

    def __setitem__(self, key, value):
        self.data[key] = value
        self._lookup[key] = self._lookup.default_factory()

    def __delitem__(self, key):
        del self.data[key]
        _ = self._lookup[key]
        del self._lookup[key]

    def items(self):
        print('items is being called!')
        yield from self.data.items()

    def __iter__(self):
        print('__iter__ is being called!')
        yield from self.data

    def __len__(self):
        return len(self.data)    


    def list_unused(self):
        return [key for key in self if self._lookup[key] == 0]

l = LDict(a='apple', b='bugger')

print({**l, **l})
print(l.list_unused())
_ = l['a']
print(l.list_unused())

which produces the output:

__iter__ is being called!
__iter__ is being called!
{'b': 'bugger', 'a': 'apple'}
__iter__ is being called!
[]
__iter__ is being called!
[]

(I only implemented the bare minimum to get example to work, I still recommend implementing the set of methods I listed about if you want your counts to be correct!)

So I guess the answer to your question is you have to

Implement the __iter__(self) method
DO NOT inherit from dict().

One critique: don't `yield from self.data.items()`, this makes `.items` return a generator object, which loses the very nice `dict.items` object, with set-like operations available. So just `return self.data.items()`. Note: `d1, d2 = {'a':3, 'b':1}, {'a':1, 'b':1}; print(d1.items() & d2.items())` — juanpa.arrivillaga, Nov 03 '17 at 03:25
And really, for `__iter__` it's probably best to just `return iter(self.data)`. No need to add generator overhead to already relatively slow iteration on `dict` objects. — juanpa.arrivillaga, Nov 03 '17 at 03:30

Count reads from python dictionary with unpacking

1 Answers1