-1

How to increment d['a']['b']['c'][1][2][3] if d is defaultdict of defaultdict without code dublication?

from collections import defaultdict
nested_dict_type = lambda: defaultdict(nested_dict_type)
nested_dict = nested_dict_type()

# incrementation
if type(nested_dict['a']['b']['c']['d'][1][2][3][4][5][6]) != int:
    nested_dict['a']['b']['c']['d'][1][2][3][4][5][6] = 0
nested_dict['a']['b']['c']['d'][1][2][3][4][5][6] += 1  # ok, now it contains 1

Here we can see that we duplicated (in the code) a chain of keys 3 times.

Question: Is it possible to write a function inc that will take nested_dict['a']['b']...[6] and do the same job as above? So:

def inc(x):
    if type(x) != int:
        x = 0
    x += 1
inc(nested_dict['a']['b']['c']['d'][1][2][3][4][5][6])  # ok, now it contains 1

Update (20 Aug 2018):

There is still no answer to the question. It's clear that there are options "how to do what I want", but the question is straightforward: there is "value", we pass it to a function, function modifies it. It looks that it's not possible. Just a value, without any "additional keys", etc. If it is so, can we make an answer more generic?

Notes:

  1. What is defaultdict of defaultdicts - SO.
  2. This question is not about "storing of integers in a defaultdict", so I'm not looking for a hierarchy of defaultdicts with an int type at the leaves.
  3. Assume that type (int in the examples) is known in advance / can be even parametrized (including the ability to perform += operator) - the question is how to dereference the object, pass it for modification and store back in the context of defaultdict of defaultdicts.
  4. Is the answer to this question related to the mutability? See example below:

Example:

def inc(x):
    x += 1

d = {'a': int(0)}
inc(d['a'])  
# d['a'] == 0, immutable

d = {'a': Int(0)}
inc(d['a'])  
# d['a'] == 1, mutated

Where Int is:

class Int:
    def __init__(self, value):
        self.value = value
    def __add__(self, v):
        self.value += v
        return self
    def __repr__(self):
        return str(self.value)
Pleeea
  • 362
  • 4
  • 12
  • 1
    Please don't use both python-3 and python-2 tags on a question. Only use a version-specific tag if you're asking a version-specific question. – PM 2Ring Aug 17 '18 at 16:21
  • I'm not sure why this question is so downvoted. It may be misguided, but that's all the more reason to provide a sensible answer. It seems pretty clear..Considering at least 2 answers have independently caught on to similar ideas. – jpp Aug 17 '18 at 16:24
  • Please sorry for the mess and long long keys! I did it artificially for you to have a feeling how important it could be not to split all those “a”, “b”, “c” and etc through the code - see my comment below to one of the answers. (The answer is great, but still I’d be glad if you’d have any ideas how to do “exactly that function inc with only one place where they keys are mentioned”). Also sorry for the wrong tags if I’ve confused your regarding python versions! – Pleeea Aug 17 '18 at 16:33
  • 1
    You may find this article helpful: [Facts and myths about Python names and values](http://nedbatchelder.com/text/names.html), which was written by SO veteran Ned Batchelder. – PM 2Ring Aug 17 '18 at 16:35

2 Answers2

2

It's not exactly abut mutability, more about how assignment performs name binding.

When you do x = 0 in your inc function you bind a new object to the name x, and any connection between that name and the previous object bound to that name is lost. That doesn't depend on whether or not x is mutable.

But since x is an item in a mutable object we can achieve what you want by passing the parent mutable object to inc along with the key needed to access the desired item.

from collections import defaultdict

nested_dict_type = lambda: defaultdict(nested_dict_type)
nested_dict = nested_dict_type()

# incrementation
def inc(ref, key):
    if not isinstance(ref[key], int):
        ref[key] = 0
    ref[key] += 1

d = nested_dict['a']['b']['c']['d'][1][2][3][4][5]
inc(d, 6)
print(d)

output

defaultdict(<function <lambda> at 0xb730553c>, {6: 1})

Now we aren't binding a new object, we're merely mutating an existing one, so the original d object gets updated correctly.


BTW, that deeply nested dict is a bit painful to work with. Maybe there's a better way to organize your data... But anyway, one thing that can be handy when working with deep nesting is to use lists or tuples of keys. Eg,

q = nested_dict
keys = 'a', 'b', 'c', 'd', 1, 2, 3, 4, 5
for k in keys:
    q = q[k]

q now refers to nested_dict['a']['b']['c']['d'][1][2][3][4][5]

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • Thank you for the great answer and everyone please take my apologies if the example was messy, but there is a reason for that. In your example you artificially break the “chain of keys”. Imagine that we would have to inc data[planet][region][city]. In your case “first to keys” would go to “d” and the last key would go to second argument of inc. So the chain is split in two places in the code. How do you think: can we avoid it? And place the “final destination in data” just once in one place. Thank you in advance for the answer once again! – Pleeea Aug 17 '18 at 16:28
  • @Pleeea The "chain" is only split in one place. And we _need_ to split it like that because we want to _mutate_ the object passed as the 1st arg of `inc`. There's no other (sane) way to do this in Python. – PM 2Ring Aug 17 '18 at 16:33
  • I was also thinking about that! But why there is no way? I feel that I’m touching something strange, cause in other languages with memory management / pointers it could be totally straightforward. Maybe something with decorators could make the deal? – Pleeea Aug 17 '18 at 16:36
  • 1
    @Pleeea Yes, it would be straight-forward if Python had pointers. But it doesn't, by design. :) Python's data model is quite different to that of C-like languages. It can be annoying at first when it won't let you do things the way you would in those languages, but gradually you'll come to appreciate the benefits of the Python way far outweigh the downsides. FWIW, I was coding in C for almost 30 years before I learned Python. – PM 2Ring Aug 17 '18 at 16:41
  • Thank you! I would suggest us avoid moving the discussion to the questions of "appreciate" and "benefits". You mentioned that the model is quite different: how exactly? Could you please provide evidence that such things I'm asking are not possible? I'm not putting pressure here but would like to have a solid background under this question. For me, it sounds like "it's not pythonic / etc" right now. Can we prove that it's not possible to make a function `wrapper` that allows us to modify the data _by reference_ in a call `inc(wrapper(nd['a']....[6]))`? What is the memory layout? – Pleeea Aug 20 '18 at 08:02
  • @Pleeea Let's say that we have `def wrapper(ref):`. When you do `wrapper(nd['a'])` the `wrapper` function gets passed whatever object the name `nd['a'])` currently refers to, bound to the local name `ref`. If that's a mutable object, then you can mutate it via the `ref` name. But if you do an assignment to `ref`, eg `ref = newobj` then that has _no effect_ on `nd['a']`, it merely binds `newobj` to the local name `ref`. And after that assignment the function will no longer be able to access the `nd['a']` object. – PM 2Ring Aug 20 '18 at 09:33
  • @Pleeea As for memory layout, that's an implementation detail, and mostly irrelevant to plain Python programming (it may be of interest when you're using Numpy, or writing an extension in C, though). You don't really have access to that stuff in normal Python (in standard CPython the id of an object is its address, but that's just an implementation detail). Objects just live in magic object land, and you leave the low-level details to the Python virtual machine. – PM 2Ring Aug 20 '18 at 09:39
  • @Pleeea The article by Ned Batchelder that I linked in the question comments has lots of good info about Python's data model and how argument passing works in Python. Also see [How do I pass a variable by reference?](https://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference), in particular the accepted answer by Blair Conrad, as well as those of Mark Ransom, David Cournapeau, and Python core dev Raymond Hettinger. Also see Fredrik Lundh's [Call by Object](http://effbot.org/zone/call-by-object.htm) article that those last 2 answers link to. – PM 2Ring Aug 20 '18 at 09:41
1

You can't have multiple default types with defaultdict. You have the following options:

  1. Nested defaultdict of defaultdict objects indefinitely;
  2. defaultdict of int objects, which likely won't suit your needs;
  3. defaultdict of defaultdict down to a specific level with int defined for the last level, e.g. d = defaultdict(lambda: defaultdict(int)) for a single nesting;
  4. Similar to (3), but for counting you can use collections.Counter instead, i.e. d = defaultdict(Counter).

I recommend the 3rd or 4th options if you are always going to go down to a set level. In other words, a scalar value will only be supplied at the nth level, where n is constant.

Otherwise, one manual option is to have a function perform the type-testing. In this case, try / except may be a good alternative. Here we also define a recursive algorithm to allow you to feed a list of keys rather than defining manual __getitem__ calls.

from collections import defaultdict
from functools import reduce
from operator import getitem

nested_dict_type = lambda: defaultdict(nested_dict_type)
d = nested_dict_type()

d[1][2] = 10

def inc(d_in, L):
    try:
        reduce(getitem, L[:-1], d_in)[L[-1]] += 1
    except TypeError:
        reduce(getitem, L[:-1], d_in)[L[-1]] = 1

inc(d, [1, 2])
inc(d, [1, 3])

print(d)

defaultdict({1: defaultdict({2: 11, 3: 1})})
jpp
  • 159,742
  • 34
  • 281
  • 339