53

I am building a class which subclasses dict, and overrides __setitem__. I would like to be certain that my method will be called in all instances where dictionary items could possibly be set.

I have discovered three situations where Python (in this case, 2.6.4) does not call my overridden __setitem__ method when setting values, and instead calls PyDict_SetItem directly

  1. In the constructor
  2. In the setdefault method
  3. In the update method

As a very simple test:

class MyDict(dict):
    def __setitem__(self, key, value):
        print "Here"
        super(MyDict, self).__setitem__(key, str(value).upper())

>>> a = MyDict(abc=123)
>>> a['def'] = 234
Here
>>> a.update({'ghi': 345})
>>> a.setdefault('jkl', 456)
456
>>> print a
{'jkl': 456, 'abc': 123, 'ghi': 345, 'def': '234'}

You can see that the overridden method is only called when setting the items explicitly. To get Python to always call my __setitem__ method, I have had to reimplement those three methods, like this:

class MyUpdateDict(dict):
    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def __setitem__(self, key, value):
        print "Here"
        super(MyUpdateDict, self).__setitem__(key, value)

    def update(self, *args, **kwargs):
        if args:
            if len(args) > 1:
                raise TypeError("update expected at most 1 arguments, got %d" % len(args))
            other = dict(args[0])
            for key in other:
                self[key] = other[key]
        for key in kwargs:
            self[key] = kwargs[key]

    def setdefault(self, key, value=None):
        if key not in self:
            self[key] = value
        return self[key]

Are there any other methods which I need to override, in order to know that Python will always call my __setitem__ method?

UPDATE

Per gs's suggestion, I've tried subclassing UserDict (actually, IterableUserDict, since I want to iterate over the keys) like this:

from UserDict import *;
class MyUserDict(IterableUserDict):
    def __init__(self, *args, **kwargs):
        UserDict.__init__(self,*args,**kwargs)

    def __setitem__(self, key, value):
        print "Here"
        UserDict.__setitem__(self,key, value)

This class seems to correctly call my __setitem__ on setdefault, but it doesn't call it on update, or when initial data is provided to the constructor.

UPDATE 2

Peter Hansen's suggestion got me to look more carefully at dictobject.c, and I realised that the update method could be simplified a bit, since the built-in dictionary constructor simply calls the built-in update method anyway. It now looks like this:

def update(self, *args, **kwargs):
    if len(args) > 1:
        raise TypeError("update expected at most 1 arguments, got %d" % len(args))
    other = dict(*args, **kwargs)
    for key in other:
        self[key] = other[key]
Ian Clelland
  • 43,011
  • 8
  • 86
  • 87
  • 1
    Does it work if you subclass UserDict instead of the normal dict? – Georg Schölly Jan 13 '10 at 23:03
  • I honestly didn't realise that UserDict was still around :) I'll try it – Ian Clelland Jan 13 '10 at 23:31
  • Userdict would have been nice, but unfortunately, its update method simply calls update on the underlying data dictionary. – Ian Clelland Jan 13 '10 at 23:56
  • When you re-implement update(), be sure to check `help(dict.update)` for details of what it actually does. I just went through this myself... your version is not an equivalent reimplementation. – Peter Hansen Jan 14 '10 at 01:20
  • @Peter: My reasoning was that since dict's constructor essentially calls update(), then calling dict(args[0]) should do exactly what is required/expected by update. Can you see a situation where I would get different behaviour from a built-in dict? – Ian Clelland Jan 14 '10 at 16:20
  • Doesn't seem to handle nested dicts (where presumably, you would want them to be created with your subclassed dicts). – Nisan.H Aug 02 '12 at 16:38

4 Answers4

55

I'm answering my own question, since I eventually decided that I really do want to subclass Dict, rather than creating a new mapping class, and UserDict still defers to the underlying Dict object in some cases, rather than using the provided __setitem__.

After reading and re-reading the Python 2.6.4 source (mostly Objects/dictobject.c, but I grepped eveywhere else to see where the various methods are used,) my understanding is that the following code is sufficient to have my __setitem__ called every time that the object is changed, and to otherwise behave exactly as a Python Dict:

Peter Hansen's suggestion got me to look more carefully at dictobject.c, and I realised that the update method in my original answer could be simplified a bit, since the built-in dictionary constructor simply calls the built-in update method anyway. So the second update in my answer has been added to the code below (by some helpful person ;-).

class MyUpdateDict(dict):
    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def __setitem__(self, key, value):
        # optional processing here
        super(MyUpdateDict, self).__setitem__(key, value)

    def update(self, *args, **kwargs):
        if args:
            if len(args) > 1:
                raise TypeError("update expected at most 1 arguments, "
                                "got %d" % len(args))
            other = dict(args[0])
            for key in other:
                self[key] = other[key]
        for key in kwargs:
            self[key] = kwargs[key]

    def setdefault(self, key, value=None):
        if key not in self:
            self[key] = value
        return self[key]

I've tested it with this code:

def test_updates(dictish):
    dictish['abc'] = 123
    dictish.update({'def': 234})
    dictish.update(red=1, blue=2)
    dictish.update([('orange', 3), ('green',4)])
    dictish.update({'hello': 'kitty'}, black='white')
    dictish.update({'yellow': 5}, yellow=6)
    dictish.setdefault('brown',7)
    dictish.setdefault('pink')
    try:
        dictish.update({'gold': 8}, [('purple', 9)], silver=10)
    except TypeError:
        pass
    else:
        raise RunTimeException("Error did not occur as planned")

python_dict = dict([('b',2),('c',3)],a=1)
test_updates(python_dict)

my_dict = MyUpdateDict([('b',2),('c',3)],a=1)
test_updates(my_dict)

and it passes. All other implementations I've tried have failed at some point. I'll still accept any answers that show me that I've missed something, but otherwise, I'm ticking the checkmark beside this one in a couple of days, and calling it the right answer :)

martineau
  • 119,623
  • 25
  • 170
  • 301
Ian Clelland
  • 43,011
  • 8
  • 86
  • 87
  • If I understand what you describe in UPDATE 2 of your question, I think you should also update the code in this answer of yours that you accepted. – martineau Dec 28 '11 at 14:18
  • 1
    Although your code for testing isn't very rigorous, in that it doesn't even check to see if your dict subclass methods are being called when it's passed a `MyUpdateDict` instance, my own version of it that checks that indicates it does, so my up-vote stands. – martineau Apr 16 '13 at 19:56
  • have you per chance investigated this for `defaultdict` and/or more recent versions of python (2.7 in particular)? – drevicko Jul 01 '14 at 15:02
  • @Ian Clelland: An excellent question and answer. In addition to being interesting and quite edifying overall, I also found it very helpful in a particular problem I've been facing. Cheers! – cfwschmidt Oct 12 '15 at 20:02
  • @drevicko It seems a subclass of defaultdict will override `__setitem__` for the purpose of setting values to defaults. I need a way to circumvent this, though, as I want to disallow directly setting items on the subclass... – Elias Hasle May 18 '21 at 13:08
  • PS: It turned out subclassing `dict` and overriding both `__missing__`, `get` and the explicitly mutating methods (`pop`, `popitem`, `update`, `clear`) and dunder methods (`__delitem__`) was easier than making a `defaultdict` work. (All my reported results here are with python 3.) – Elias Hasle May 19 '21 at 10:09
4

What is your use-case for subclassing dict?

You don't need to do this to implement a dict-like object, and it might be simpler in your case to write an ordinary class, then add support for the required subset of the dict interface.

The best way to accomplish what you're after is probably the MutableMapping abstract base class. PEP 3119 -- Introducing Abstract Base Classes

This will also help you anser the question "Are there any other methods which I need to override?". You will need to override all the abstract methods. For MutableMapping: Abstract methods include setitem, delitem. Concrete methods include pop, popitem, clear, update.

mluebke
  • 8,588
  • 7
  • 35
  • 31
  • 2
    I'm overriding setitem in order to do some data validation. The object is essentially a dictionary with some extra methods that act on its data, and I'd like the coders using it to be able to use it exactly as they would use a dictionary, without a long list of methods to stay away from. I'll take a look at ABCs; If they work, then I will have to see if I can upgrade all of the deployment environments to 2.6 – Ian Clelland Jan 13 '10 at 23:47
4

I found Ian answer and comments very helpful and clear. I would just point out that maybe a first call to the super-class __init__ method might be safer, when not necessary: I recently needed to implement a custom OrderedDict (I'm working with Python 2.7): after implementing and modifying my code according to the proposed MyUpdateDict implementation, I found out that by simply replacing

class MyUpdateDict(dict):

with:

from collections import OrderedDict
class MyUpdateDict(OrderedDict):

then the test code posted above failed:

Traceback (most recent call last):
File "Desktop/test_updates.py", line 52, in <module>
    my_dict = MyUpdateDict([('b',2),('c',3)],a=1)
File "Desktop/test_updates.py", line 5, in __init__
    self.update(*args, **kwargs)
File "Desktop/test_updates.py", line 18, in update
    self[key] = other[key]
File "Desktop/test_updates.py", line 9, in __setitem__
    super(MyUpdateDict, self).__setitem__(key, value)
File "/usr/lib/python2.7/collections.py", line 59, in __setitem__
    root = self.__root
AttributeError: 'MyUpdateDict' object has no attribute '_OrderedDict__root'

Looking at collections.py code it turns out that OrderedDict needs its __init__ method to be called in order to initialize and setup necessary custom attributes.

Therefore, by simply adding a first call to the super __init__ method,

from collections import OrderedDict
class MyUpdateDict(Orderedict):
def __init__(self, *args, **kwargs):
    super(MyUpdateDict, self).__init__() #<-- HERE call to super __init__
    self.update(*args, **kwargs)

we have a more general solution which apparently works for both dict and OrderedDict.

I cannot state if this solution is generally valid, because I tested it with OrderedDict only. However, it is likely that a call to the super __init__ method is either harmless or necessary rather than harmful, when trying to extend other dict subclasses

rizac
  • 61
  • 1
  • 4
0

Use object.keyname = value instead of object["keyname"] = value

  • 2
    While you *could* do this to save a key-value pair... it doesn't help in fixing the issue and this question doesn't require a workaround (based on the accepted answer...) – Jared Apr 08 '13 at 08:34