100

I am debugging some code and I want to find out when a particular dictionary is accessed. Well, it's actually a class that subclass dict and implements a couple extra features. Anyway, what I would like to do is subclass dict myself and add override __getitem__ and __setitem__ to produce some debugging output. Right now, I have

class DictWatch(dict):
    def __init__(self, *args):
        dict.__init__(self, args)

    def __getitem__(self, key):
        val = dict.__getitem__(self, key)
        log.info("GET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
        return val

    def __setitem__(self, key, val):
        log.info("SET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
        dict.__setitem__(self, key, val)

'name_label' is a key which will eventually be set that I want to use to identify the output. I have then changed the class I am instrumenting to subclass DictWatch instead of dict and changed the call to the superconstructor. Still, nothing seems to be happening. I thought I was being clever, but I wonder if I should be going a different direction.

Thanks for the help!

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Michael Mior
  • 28,107
  • 9
  • 89
  • 113

6 Answers6

86

Another issue when subclassing dict is that the built-in __init__ doesn't call update, and the built-in update doesn't call __setitem__. So, if you want all setitem operations to go through your __setitem__ function, you should make sure that it gets called yourself:

class DictWatch(dict):
    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def __getitem__(self, key):
        val = dict.__getitem__(self, key)
        print('GET', key)
        return val

    def __setitem__(self, key, val):
        print('SET', key, val)
        dict.__setitem__(self, key, val)

    def __repr__(self):
        dictrepr = dict.__repr__(self)
        return '%s(%s)' % (type(self).__name__, dictrepr)
        
    def update(self, *args, **kwargs):
        print('update', args, kwargs)
        for k, v in dict(*args, **kwargs).items():
            self[k] = v
Matt Anderson
  • 19,311
  • 11
  • 41
  • 57
  • 17
    If you are using Python 3, you'll want to change this example so that `print` is the `print()` function and the `update()` method uses `items()` instead of `iteritems()`. – Al Sweigart Sep 18 '17 at 04:01
  • I have tried your sol, but it seems that it only works for **only one level of indexing (i.e., dict[key] and not dict[key1][key2] ... )*** – ndrwnaguib Apr 04 '19 at 16:42
  • d[key1] returns something, perhaps a dictionary. The second key indexes that. This technique can’t work unless that returned thing supports the watch behavior also. – Matt Anderson Apr 04 '19 at 16:48
  • 1
    @AndrewNaguib: Why should it work with nested arrays? Nested array do not work with normal python dict either (if you did not implement it yourself) – Igor Chubin May 01 '19 at 11:32
  • Yes I did not know so :), for nested indexing level `DictWatch(val)` should be returned instead. – ndrwnaguib May 01 '19 at 11:34
  • 1
    @AndrewNaguib: `__getitem__` would need to test `val` and only do that conditionally — i.e. `if isinstance(val, dict): ...` – martineau Sep 18 '19 at 18:46
  • 1
    Having to overwrite 5 methods for a simple case feel overcomplicated. This is why `collections.UserDict` exists. `UserDict` only require to overwrite `__setitem__` to be compatible with `__init__`, `setdefault`, `update`,... – Conchylicultor Nov 02 '20 at 17:01
  • 1
    Subclassing `MutableMapping` or `UserDict` is preferred over subclassing `dict` in most cases. However `UserDict` does not subclass `dict` so if you need the real builtin python `dict` as your parent class, this does not help you. @Conchylicultor – Matt Anderson Nov 18 '20 at 18:47
  • Does the `update` method take any more argument than a positional argument for the other dictionary that is used to update the first dictionary? – HelloGoodbye Jul 19 '22 at 09:10
45

What you're doing should absolutely work. I tested out your class, and aside from a missing opening parenthesis in your log statements, it works just fine. There are only two things I can think of. First, is the output of your log statement set correctly? You might need to put a logging.basicConfig(level=logging.DEBUG) at the top of your script.

Second, __getitem__ and __setitem__ are only called during [] accesses. So make sure you only access DictWatch via d[key], rather than d.get() and d.set()

adamJLev
  • 13,713
  • 11
  • 60
  • 65
BrainCore
  • 5,214
  • 4
  • 33
  • 38
  • Actually it's not extra parens, but a missing opening paren around `(str(dict.get(self, 'name_label')), str(key), str(val)))` – cobbal Mar 06 '10 at 00:44
  • 3
    True. To the OP: For future reference, you can simply do log.info('%s %s %s', a, b, c), instead of a Python string formatting operator. – BrainCore Mar 06 '10 at 00:50
  • Logging level ended up being the issue. I'm debugging someone else's code and I was originally testing in another file which head a different level of debugging set. Thanks! – Michael Mior Mar 06 '10 at 03:01
25

Consider subclassing UserDict or UserList. These classes are intended to be subclassed whereas the normal dict and list are not, and contain optimisations.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
andrew pate
  • 3,833
  • 36
  • 28
  • 18
    For reference, the [documentation](https://docs.python.org/3.6/library/collections.html?highlight=userdict#collections.UserDict) in Python 3.6 says "The need for this class has been partially supplanted by the ability to subclass directly from dict; however, this class can be easier to work with because the underlying dictionary is accessible as an attribute". – Sean Sep 16 '18 at 17:33
  • 1
    @andrew an example might be helpful. – Vasantha Ganesh Sep 26 '19 at 09:40
  • 3
    @VasanthaGaneshK https://treyhunner.com/2019/04/why-you-shouldnt-inherit-from-list-and-dict-in-python/ – SirDorius Feb 11 '20 at 15:53
9

That should not really change the result (which should work, for good logging threshold values) : your init should be :

def __init__(self,*args,**kwargs) : dict.__init__(self,*args,**kwargs) 

instead, because if you call your method with DictWatch([(1,2),(2,3)]) or DictWatch(a=1,b=2) this will fail.

(or,better, don't define a constructor for this)

makapuf
  • 1,370
  • 1
  • 13
  • 23
9

As Andrew Pate's answer proposed, subclassing collections.UserDict instead of dict is much less error prone.

Here is an example showing an issue when inheriting dict naively:

class MyDict(dict):

  def __setitem__(self, key, value):
    super().__setitem__(key, value * 10)


d = MyDict(a=1, b=2)  # Bad! MyDict.__setitem__ not called
d.update(c=3)  # Bad! MyDict.__setitem__ not called
d['d'] = 4  # Good!
print(d)  # {'a': 1, 'b': 2, 'c': 3, 'd': 40}

UserDict inherits from collections.abc.MutableMapping, so this works as expected:

class MyDict(collections.UserDict):

  def __setitem__(self, key, value):
    super().__setitem__(key, value * 10)


d = MyDict(a=1, b=2)  # Good: MyDict.__setitem__ correctly called
d.update(c=3)  # Good: MyDict.__setitem__ correctly called
d['d'] = 4  # Good
print(d)  # {'a': 10, 'b': 20, 'c': 30, 'd': 40}

Similarly, you only have to implement __getitem__ to automatically be compatible with key in my_dict, my_dict.get, …

Note: UserDict is not a subclass of dict, so isinstance(UserDict(), dict) will fail (but isinstance(UserDict(), collections.abc.MutableMapping) will work).

Michael
  • 8,362
  • 6
  • 61
  • 88
Conchylicultor
  • 4,631
  • 2
  • 37
  • 40
1

All you will have to do is

class BatchCollection(dict):
    def __init__(self, inpt={}):
        super(BatchCollection, self).__init__(inpt)

A sample usage for my personal use

### EXAMPLE
class BatchCollection(dict):
    def __init__(self, inpt={}):
        super(BatchCollection, self).__init__(inpt)

    def __setitem__(self, key, item):
        if (isinstance(key, tuple) and len(key) == 2
                and isinstance(item, collections.Iterable)):
            # self.__dict__[key] = item
            super(BatchCollection, self).__setitem__(key, item)
        else:
            raise Exception(
                "Valid key should be a tuple (database_name, table_name) "
                "and value should be iterable")

Note: tested only in python3

ravi404
  • 7,119
  • 4
  • 31
  • 40