3

I know that I can add a new key/value in a python dict by doing

some_dict['absent_key'] = somevalue

But I don't really understand the internals work.

I used to think dictionaries behaved like C++ maps. Where the [] operator would create the element for the given key if it does not already exist, then return a reference to it so that it can be assigned a value in the same line with operator =.

But that behavior in C++ has the consequence that if we query from a map the value for a key that does not exist, then the element is created for that key, and the default value for the value type will be returned instead of an error. In python, this throws a KeyError.

So what I don't understand is: How, since [] operator must be evaluated before = in python too (I think?), does it behave differently depending if the result will be read or assigned a value (which it should not know about at that point of the expression evaluation)?

Is there a difference in the order in which python evaluates expressions? Or is the interpreter simply smarter since dictionaries hare a hardcoded type so it knows more precisely how it behaves, while std::map are in a 'library' so the compiler can assume less? Or some other reason?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
bartoli
  • 173
  • 8
  • 7
    On the lhs, `[]` is mapped to `__setitem__(key, rhs)` on the rhs `[]` maps to `__getitem__(key)` - they implement the different logic. In C++ `[]` is its own operator that returns a reference that can be updated. In python `=` is a statement and the `lhs` is not an expression like in `C++`. – AChampion Aug 18 '17 at 11:46
  • Side note: the access operator `[]` is not necessary: you can do `dictionary.get(key, some_optional_default_value_if_key_not_found)`. – Jared Smith Aug 18 '17 at 11:59

3 Answers3

5

The operations:

some_dict[key]

and

some_dict[key] = value

and

del some_dict[key]

use different special methods of the object: __getitem__, __setitem__ and __delitem__. So it's not just one operator ([]) that implements them all.

Maybe an example can illustrate that:

class Something(dict):  # subclassing dict
    def __getitem__(self, key):
        print('trying to get', key)
        return super().__getitem__(key)
    def __setitem__(self, key, value):
        print('trying to set', key, 'to', value)
        return super().__setitem__(key, value)
    def __delitem__(self, key):
        print('trying to delete', key)
        return super().__delitem__(key)

Test:

>>> s = Something({'a': 1, 'b': 2})
>>> s['a']
trying to get a
1

>>> s['c'] = 10
trying to set c to 10

>>> del s['b']
trying to delete b

So it depends on how they are implemented. In plain Python dicts __getitem__ just returns the value for the key or throws if it's not present.

But subclasses could also implement the __missing__ method - in case they want to customize the behavior if the key wasn't present in the dict (during lookup).

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 1
    Cool, so that also means i could subclass dict, and overload __getitem__() to get it to behave like a std::map if i wanted :) – bartoli Aug 18 '17 at 12:03
  • 1
    @bartoli Yes. But I think you don't need to create your own subclass for that: [`collections.defaultdict`](https://docs.python.org/library/collections.html#collections.defaultdict) is also available in Pythons standard library. Not sure it's identical to `std::map` but it's "closer". :) – MSeifert Aug 18 '17 at 12:05
  • 1
    @MSeifert I'd say the behavior is _very_ close ;-) See here: https://stackoverflow.com/questions/10124679/what-happens-if-i-read-a-maps-value-where-the-key-does-not-exist. – Christian Dean Aug 18 '17 at 12:30
  • @ChristianDean I'm always careful when comparing data structures from different languages. If you use absolutes (a from language A is equivalent to b from language B) someone will poke at it and come up with 100 (some valid some ... not) reasons why it's not equivalent. Just my personal experience. – MSeifert Aug 18 '17 at 12:35
  • @MSeifert I completely agree. That's why I was adding a caveat to my answer. When I said "very close" in my comment, I meant purely the _behavior_ of what happens when a key is missing. In both data structures, an error is not raised. Of course there are still differences such as the "factory function" that is called when a key is missing, so I would never claim them to be identical. – Christian Dean Aug 18 '17 at 12:46
1

What's going on behind the scenes?

In Python, When you assign a value to a key:

dictionary[key] = value

Python translates the above syntactic sugar into:

dictionary.__setitem__(key, value)

As you can see, behind the scenes Python calls the __setitem__ method. The __setitem__ method corresponds directly to the operation of indexing a data structure and assigning a new value to said index. It can be overloaded to customize it's behavior.

The default behavior with __setitem__ for Python dictionaries is to change the key's value if it exists, and if not raise a KeyError. To prove this, you can subclass the dict class and overload __setitem__ to display it's arguments:

>>> class Dict(dict):
...     def __setitem__(self, key, value):
...         print('Putting "%s" in dict with value of "%s"' % (key, value))
...         super().__setitem__(key, value)
...
>>>
>>> d = Dict()
>>> d['name'] = 'Hammy'
Putting "name" in dict with value of "Hammy"
>>> d['age'] = 25
Putting "age" in dict with value of "25"
>>> d
{'name': 'Hammy', 'age': 25}

Does Python have an std::map equivalent?

Like @MSeifert said, you can customize what happens when a key is not present by overloading the __missing__ method.

That is what the collections.defaultdict class does in the standard library. It overloads __missing__ to create a missing key and map a default value of your choice to it. Here's the relevant snippet from the CPython source:

static PyObject *
defdict_missing(defdictobject *dd, PyObject *key)
{
    PyObject *factory = dd->default_factory;
    PyObject *value;
    /* ... */
    value = PyEval_CallObject(factory, NULL);
    if (value == NULL)
        return value;
    if (PyObject_SetItem((PyObject *)dd, key, value) < 0) {
        Py_DECREF(value);
        return NULL;
    }
    return value;
}

Note that defaultdict is implemented in C. Here's an example of the usage:

>>> from collections import defaultdict
>>> map = defaultdict(int)
>>> map['a'] = 1
>>> map['b'] = 2
>>> map['c'] # default factory function `int` called
0
>>> map
defaultdict(<class 'int'>, {'a': 1, 'b': 2, 'c': 0})

defaultdict pretty much matches the behavior of the std::map::operator[]. If a key is not present when using std::map::operator[], the operator calls a "factory function" that matches the key's value's expected types, and assigns that to the missing key.

So if you want something that behaves like std::map, use defaultdict. Note I said "like", though. That's because C++ and Python are two complete different languages. Saying a data structure in one language has an exact equivalent in another is very rarely correct.

Christian Dean
  • 22,138
  • 7
  • 54
  • 87
0

The my_dict['key'] = 'value' notation is just sugar for:

my_dict.__setitem__('key', 'value')

That function does all the work of storing the data. It can be implemented, however, you want. The underlying mechanism python interpreters and libraries use is most often from a faster compiled language like C.

There are more functions like this like, __len__(), __getitem__(x), and __delitem__(x) which handle all the other dict like operations.

André C. Andersen
  • 8,955
  • 3
  • 53
  • 79