124

A class has a constructor which takes one parameter:

class C(object):
    def __init__(self, v):
        self.v = v
        ...

Somewhere in the code, it is useful for values in a dict to know their keys.
I want to use a defaultdict with the key passed to newborn default values:

d = defaultdict(lambda : C(here_i_wish_the_key_to_be))

Any suggestions?

sophros
  • 14,672
  • 11
  • 46
  • 75
Benjamin Nitlehoo
  • 1,611
  • 2
  • 13
  • 12

6 Answers6

171

It hardly qualifies as clever - but subclassing is your friend:

class keydefaultdict(defaultdict):
    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError( key )
        else:
            ret = self[key] = self.default_factory(key)
            return ret

d = keydefaultdict(C)
d[x] # returns C(x)
Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
  • 25
    That's exactly the uglyness I'm trying to avoid... Even using a simple dict and checking for key existence is much cleaner. – Benjamin Nitlehoo May 26 '10 at 11:31
  • 3
    @Paul: and yet this is your answer. Ugliness? Come on! – tzot Jun 25 '10 at 01:18
  • 4
    I think I'm just going to take that bit of code and put it in my personalized general utilities module so I can use it whenever I want. Not too ugly that way... – weronika Sep 07 '11 at 04:28
  • 32
    +1 Directly addresses the OP's question and doesn't look "ugly" to me. Also a good answer because many don't seem to realize that `defaultdict`'s `__missing__()` method can be overridden (as it can in any subclass of the built-in `dict` class since version 2.5). – martineau Jan 01 '12 at 02:15
  • 1
    No, this answer is a nice suggestion, and useful, but clearly not what the OP is asking for. The whole point of this question is to avoid exactly this. – Stuart Berg Jan 29 '16 at 19:13
  • 12
    +1 The whole purpose of \_\_missing\_\_ is to customize the behavior for missing keys. The dict.setdefault() approach mentioned by @silentghost would also work (on the plus side, setdefault() is short and already exists; on the minus side, it suffers from efficiency issues and no one really likes the name "setdefault"). – Raymond Hettinger Apr 18 '16 at 04:12
  • `dict.setdefault(...)` is a much cleaner way to tackle this. – Muposat Aug 31 '17 at 14:09
  • I disagree about uglyness, It is ugly if you do not plan to reuse it. But for multiple usages, each instance is a single line. – arivero Sep 19 '17 at 12:11
  • any pypi lib has this? does not look ugly but also not fit in app code. – balki Dec 29 '18 at 13:19
  • 1
    Note that you can also directly subclass `dict` instead of subclassing `defaultdict`, and provide your own constructor that takes a `default_factory`. If you're using mypy or some other static type-checker, you may have to take this approach; it won't like the code in this answer as written because `defaultdict`'s `default_factory` property to be a callable with no arguments. – Mark Amery Sep 29 '19 at 15:58
  • I know it's 10 years later but I love this solution! Easy to grok, relatively elegant, and learned a little bit about Python internals – dancow Aug 26 '20 at 00:28
  • 1
    @Muposat: the problem with `setdefault` is that you have to pass the actual default value - as opposed to a function that would only be called to generate it if needed - meaning you'll end up needlessly invoking the function / creating a new instance on each invocation. – Tom Jan 14 '21 at 11:48
  • Unlike the original poster, I find this method both useful and elegant. Too bad that this much-needed functionality isn't the standard. – user2579823 May 26 '21 at 04:26
  • But wouldn't this solution search the key twice (in case it's missing)? One time to discover that it's missing, and then a second time inside __missing__(), when you assign self[key] = ... -- because the assignment automatically searches the key. Right? – Amenhotep Aug 10 '22 at 16:03
  • 1
    @Amenhotep In both cases there is no search O(n), but an access/write with O(1). – Robert Siemer Oct 29 '22 at 04:08
39

No, there is not.

The defaultdict implementation can not be configured to pass missing key to the default_factory out-of-the-box. Your only option is to implement your own defaultdict subclass, as suggested by @JochenRitzel, above.

But that isn't "clever" or nearly as clean as a standard library solution would be (if it existed). Thus the answer to your succinct, yes/no question is clearly "No".

It's too bad the standard library is missing such a frequently needed tool.

Stuart Berg
  • 17,026
  • 12
  • 67
  • 99
  • 3
    Yep, it would have been a better design choice to let the factory take the key (unary function rather than nullary). It's easy to discard an argument when we want to return a constant. – YvesgereY May 20 '20 at 12:40
  • Although succinctness is an OK goal (obfuscating jargon and 'big words' are also succinct), I find the `defaultdict` to be necessary rather than nice to have when bootstrapping some automatic built-in (like setting up an object reference cache in a constructor). Without the connection to the key, the `defaultdict` is much less useful than it could be, and is not so easy to replace w/out the "ugly" workaround. – Chris May 17 '22 at 19:50
7

I don't think you need defaultdict here at all. Why not just use dict.setdefault method?

>>> d = {}
>>> d.setdefault('p', C('p')).v
'p'

That will of course would create many instances of C. In case it's an issue, I think the simpler approach will do:

>>> d = {}
>>> if 'e' not in d: d['e'] = C('e')

It would be quicker than the defaultdict or any other alternative as far as I can see.

ETA regarding the speed of in test vs. using try-except clause:

>>> def g():
    d = {}
    if 'a' in d:
        return d['a']


>>> timeit.timeit(g)
0.19638929363557622
>>> def f():
    d = {}
    try:
        return d['a']
    except KeyError:
        return


>>> timeit.timeit(f)
0.6167065411074759
>>> def k():
    d = {'a': 2}
    if 'a' in d:
        return d['a']


>>> timeit.timeit(k)
0.30074866358404506
>>> def p():
    d = {'a': 2}
    try:
        return d['a']
    except KeyError:
        return


>>> timeit.timeit(p)
0.28588609450770264
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
  • 8
    This is highly wasteful in cases where d is accessed many times, and only rarely missing a key: C(key) will thus create tons of unneeded objects for the GC to collect. Also, in my case there is an additional pain, since creating new C objects is slow. – Benjamin Nitlehoo May 26 '10 at 11:54
  • @Paul: that's right. I would suggest then even more simple method, see my edit. – SilentGhost May 26 '10 at 12:15
  • I'm not sure it is quicker than defaultdict, but this is what I usually do (see my comment to THC4k's answer). I hoped there is a simple way to hack around the fact default_factory takes no args, to keep the code slightly more elegant. – Benjamin Nitlehoo May 26 '10 at 12:35
  • @Paul: of course it's faster! it's a single `in` statement! It is also clean and readable. `defaultdict` has just different intention behind it. – SilentGhost May 26 '10 at 12:44
  • it is an 'if k in d' vs. (a hidden) 'try: d[k] except KeyError'; CPython's implementation is very fast with exceptions, so should be on the same speed level. – Benjamin Nitlehoo May 26 '10 at 14:04
  • @Paul: you understand that these are different pieces of coded, right? Additionally, `in` would always be faster that the try-except clause. – SilentGhost May 26 '10 at 14:10
  • Exceptions are as fast as tests. This is one of the reasons BTAFTP exists alongside LBYL. (Although it turned out to be implementation-specific: IronPython is extremely slow with exceptions, due to .NET design). – Benjamin Nitlehoo May 27 '10 at 09:35
  • 6
    @SilentGhost: I don't understand - how does this solve the OP's problem? I thought OP wanted any attempt to read `d[key]` to return `d[key] = C(key)` if `key not in d`. But your solution requires him to actually go and pre-set `d[key]` in advance? How would he know which `key` he'd need? – max Apr 30 '12 at 16:56
  • Awesome! No ugly code and only uses standard dict: `D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D` – Muposat Aug 31 '17 at 14:07
  • 4
    Because setdefault is ugly as hell and the defaultdict from collection SHOULD suport a factory function which receive the key. What a wasted oportunity from the Python designers! – jgomo3 Jul 23 '18 at 21:51
7

I just want to expand on Jochen Ritzel's answer with a version that makes typecheckers happy:

from typing import Callable, TypeVar

K = TypeVar("K")
V = TypeVar("V")

class keydefaultdict(dict[K, V]):
    def __init__(self, default_factory: Callable[[K], V]):
        super().__init__()
        self.default_factory = default_factory

    def __missing__(self, key: K) -> V:
        if self.default_factory is None:
            raise KeyError(key)
        else:
            ret = self[key] = self.default_factory(key)
            return ret
Paulo Costa
  • 333
  • 4
  • 11
2

Here's a working example of a dictionary that automatically adds a value. The demonstration task in finding duplicate files in /usr/include. Note customizing dictionary PathDict only requires four lines:

class FullPaths:

    def __init__(self,filename):
        self.filename = filename
        self.paths = set()

    def record_path(self,path):
        self.paths.add(path)

class PathDict(dict):

    def __missing__(self, key):
        ret = self[key] = FullPaths(key)
        return ret

if __name__ == "__main__":
    pathdict = PathDict()
    for root, _, files in os.walk('/usr/include'):
        for f in files:
            path = os.path.join(root,f)
            pathdict[f].record_path(path)
    for fullpath in pathdict.values():
        if len(fullpath.paths) > 1:
            print("{} located in {}".format(fullpath.filename,','.join(fullpath.paths)))
gerardw
  • 5,822
  • 46
  • 39
0

Another way that you can potentially achieve the desired functionality is by using decorators

def initializer(cls: type):
    def argument_wrapper(
        *args: Tuple[Any], **kwargs: Dict[str, Any]
    ) -> Callable[[], 'X']:
        def wrapper():
            return cls(*args, **kwargs)

        return wrapper

    return argument_wrapper


@initializer
class X:
    def __init__(self, *, some_key: int, foo: int = 10, bar: int = 20) -> None:
        self._some_key = some_key
        self._foo = foo
        self._bar = bar

    @property
    def key(self) -> int:
        return self._some_key

    @property
    def foo(self) -> int:
        return self._foo

    @property
    def bar(self) -> int:
        return self._bar

    def __str__(self) -> str:
        return f'[Key: {self.key}, Foo: {self.foo}, Bar: {self.bar}]'

Then you can create a defaultdict as so:

>>> d = defaultdict(X(some_key=10, foo=15, bar=20))
>>> d['baz']
[Key: 10, Foo: 15, Bar: 20]
>>> d['qux']
[Key: 10, Foo: 15, Bar: 20]

The default_factory will create new instances of X with the specified arguments.

Of course, this would only be useful if you know that the class will be used in a default_factory. Otherwise, in-order to instantiate an individual class you would need to do something like:

x = X(some_key=10, foo=15)()

Which is kind of ugly... If you wanted to avoid this however, and introduce a degree of complexity, you could also add a keyword parameter like factory to the argument_wrapper which would allow for generic behaviour:

def initializer(cls: type):
    def argument_wrapper(
        *args: Tuple[Any], factory: bool = False, **kwargs: Dict[str, Any]
    ) -> Callable[[], 'X']:
        def wrapper():
            return cls(*args, **kwargs)

        if factory:
            return wrapper
        return cls(*args, **kwargs)

    return argument_wrapper

Where you could then use the class as so:

>>> X(some_key=10, foo=15)
[Key: 10, Foo: 15, Bar: 20]
>>> d = defaultdict(X(some_key=15, foo=15, bar=25, factory=True))
>>> d['baz']
[Key: 15, Foo: 15, Bar: 25]
Jake
  • 1
  • 1