Use cases for the 'setdefault' dict method

Question

The addition of collections.defaultdict in Python 2.5 greatly reduced the need for dict's setdefault method. This question is for our collective education:

What is setdefault still useful for, today in Python 2.6/2.7?
What popular use cases of setdefault were superseded with collections.defaultdict?

score 258 · Accepted Answer · edited Feb 24 '20 at 01:09

258

You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict.

Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby)

# really verbose
new = {}
for (key, value) in data:
    if key in new:
        new[key].append( value )
    else:
        new[key] = [value]


# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # key might exist already
    group.append( value )


# even simpler with defaultdict 
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append( value ) # all keys have a default already

Sometimes you want to make sure that specific keys exist after creating a dict. defaultdict doesn't work in this case, because it only creates keys on explicit access. Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:

headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
    headers.setdefault( headername, defaultvalue )

edited Feb 24 '20 at 01:09

Stan James

2,535
1
28
35

answered Aug 14 '10 at 14:05

Jochen Ritzel

104,512
31
200
194

1

Indeed, this IMHO is the chief use case for replacement by `defaultdict`. Can you give an example of what your mean in the first paragraph? – Eli Bendersky Aug 14 '10 at 14:11
I would certainly not do that for the last example. Why not use `headers = dict(optional_headers); headers.update(parse_headers(msg))` or even a `defaultdict` for headers before using `update`? – Muhammad Alkarouri Aug 15 '10 at 06:09
2

Muhammad Alkarouri: What you do first is copy the dict then overwrite some of the items. I do that a lot too and I guess that is actually the idiom most prefer over `setdefault`. A `defaultdict` on the other hand wouldn't work if not all the `defaultvalues` are equal (ie some are `0` and some are `[]`). – Jochen Ritzel Aug 15 '10 at 11:04
2

@YHC4k, yes. That is why I used `headers = dict(optional_headers)`. For the case when the default values are not all equal. And the end result is the same as if you get the HTTP headers first then set the defaults for those you didn't get. And it is quite usable if you already have `optional_headers`. Try my given 2 step code and compare it to yours, and you'll see what I mean. – Muhammad Alkarouri Aug 15 '10 at 12:01
A real-world example: the [django-dotenv](https://github.com/jpadilla/django-dotenv/blob/1f96f1784b2ead331f18fa6c27b8533828ac0b89/dotenv.py#L51) module uses `setdefault` to set values in `os.environ` if they are missing. – André Laszlo Feb 27 '15 at 12:32
32

or just do `new.setdefault(key, []).append(value)` – fmalina Aug 25 '15 at 13:44
3

I find it weird that the best answer boils down to `defaultdict` is even better than `setdefault` (so where's the use case now ?). Also, `ChainMap` would better handle the `http` example, IMO. – YvesgereY Mar 16 '16 at 09:28
The method `setdefault` is one of the few constructs in Python that violates the [Command-query separation](https://en.wikipedia.org/wiki/Command%E2%80%93query_separation) principle (it both modifies the object AND returns a value), thus creates code which is atypical and harder to read. Moreover, it can be mostly replaced by `defaultdict`. – Jeyekomon Jun 15 '21 at 09:59
@Jeyekomon Yes which means in his example it only works because he assigns a list object which is mutable. With a int/float/str/... unmutable object it wouldn't work to make a counter for example since modifying the return value wouldn't update the value in the dictionary – Louis Cottereau Jan 12 '23 at 10:33

score 37 · Answer 2 · answered Aug 14 '10 at 15:01

I commonly use setdefault for keyword argument dicts, such as in this function:

def notify(self, level, *pargs, **kwargs):
    kwargs.setdefault("persist", level >= DANGER)
    self.__defcon.set(level, **kwargs)
    try:
        kwargs.setdefault("name", self.client.player_entity().name)
    except pytibia.PlayerEntityNotFound:
        pass
    return _notify(level, *pargs, **kwargs)

It's great for tweaking arguments in wrappers around functions that take keyword arguments.

score 19 · Answer 3 · answered May 30 '11 at 07:54

19

defaultdict is great when the default value is static, like a new list, but not so much if it's dynamic.

For example, I need a dictionary to map strings to unique ints. defaultdict(int) will always use 0 for the default value. Likewise, defaultdict(intGen()) always produces 1.

Instead, I used a regular dict:

nextID = intGen()
myDict = {}
for lots of complicated stuff:
    #stuff that generates unpredictable, possibly already seen str
    strID = myDict.setdefault(myStr, nextID())

Note that dict.get(key, nextID()) is insufficient because I need to be able to refer to these values later as well.

intGen is a tiny class I build that automatically increments an int and returns its value:

class intGen:
    def __init__(self):
        self.i = 0

    def __call__(self):
        self.i += 1
    return self.i

If someone has a way to do this with defaultdict I'd love to see it.

answered May 30 '11 at 07:54

David Kanarek

12,611
5
45
62

for a way to do it with (a subclass of) defaultdict, see this question: http://stackoverflow.com/questions/2912231/is-there-a-clever-way-to-pass-the-key-to-defaultdicts-default-factory – weronika Sep 07 '11 at 06:55
10

You could replace `intGen` with `itertools.count().next`. – Antimony Oct 24 '12 at 04:36
11

`nextID()`'s value is going to be incremented everytime `myDict.setdefault()` is called, even if the value it returns isn't used as a `strID`. This seems wasteful somehow and illustrates one of the things I don't like about `setdefault()` in general -- namely that it always evaluates its `default` argument whether or not it actually gets used. – martineau Jan 06 '13 at 19:57
1

You can do it with `defaultdict`: `myDict = defaultdict(lambda: nextID())`. Later, `strID = myDict[myStr]` in the loop. – musiphil Aug 27 '15 at 19:06
_"If someone has a way to do this with defaultdict I'd love to see it."_ --> http://ideone.com/psOZ5M – moooeeeep Apr 05 '17 at 08:29
6

To get the behavior you describe with defaultdict, why not just `myDict = defaultdict(nextID)`? – forty_two May 15 '17 at 04:36

picmate 涅 · Answer 4 · 2019-12-11T15:28:02.763

As most answers state setdefault or defaultdict would let you set a default value when a key doesn't exist. However, I would like to point out a small caveat with regard to the use cases of setdefault. When the Python interpreter executes setdefaultit will always evaluate the second argument to the function even if the key exists in the dictionary. For example:

In: d = {1:5, 2:6}

In: d
Out: {1: 5, 2: 6}

In: d.setdefault(2, 0)
Out: 6

In: d.setdefault(2, print('test'))
test
Out: 6

As you can see, print was also executed even though 2 already existed in the dictionary. This becomes particularly important if you are planning to use setdefault for example for an optimization like memoization. If you add a recursive function call as the second argument to setdefault, you wouldn't get any performance out of it as Python would always be calling the function recursively.

Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization. lru_cache handles the caching requirements for a recursive function better.

score 11 · Answer 5 · edited May 23 '17 at 12:34

11

I use setdefault() when I want a default value in an OrderedDict. There isn't a standard Python collection that does both, but there are ways to implement such a collection.

edited May 23 '17 at 12:34

Community

1
1

answered Jan 21 '14 at 22:17

AndrewL

3,126
5
31
33

score 10 · Answer 6 · answered Jun 09 '11 at 03:49

As Muhammad said, there are situations in which you only sometimes wish to set a default value. A great example of this is a data structure which is first populated, then queried.

Consider a trie. When adding a word, if a subnode is needed but not present, it must be created to extend the trie. When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.

A defaultdict cannot do this. Instead, a regular dict with the get and setdefault methods must be used.

Muhammad Alkarouri · Answer 7 · 2010-08-14T22:04:53.717

5

Theoretically speaking, setdefault would still be handy if you sometimes want to set a default and sometimes not. In real life, I haven't come across such a use case.

However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):

>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]

I would say that using __dict__.setdefault is a pretty useful case.

Edit: As it happens, this is the only example in the standard library and it is in a comment. So may be it is not enough of a case to justify the existence of setdefault. Still, here is an explanation:

Objects store their attributes in the __dict__ attribute. As it happens, the __dict__ attribute is writeable at any time after the object creation. It is also a dictionary not a defaultdict. It is not sensible for objects in the general case to have __dict__ as a defaultdict because that would make each object having all legal identifiers as attributes. So I can't foresee any change to Python objects getting rid of __dict__.setdefault, apart from deleting it altogether if it was deemed not useful.

edited Aug 14 '10 at 22:04

answered Aug 14 '10 at 14:52

Muhammad Alkarouri

23,884
19
66
101

1

Could you elaborate - what makes __dict_.setdefault particularly useful? – Eli Bendersky Aug 14 '10 at 15:48
1

@Eli: I think the point is that `__dict__` is by implementation a `dict`, not a `defaultdict`. – Katriel Aug 14 '10 at 18:47
1

Alright. I don't mind about `setdefault` staying in Python, but it's curious to see that it's now almost useless. – Eli Bendersky Aug 15 '10 at 05:50
@Eli: I agree. I don't think there are enough reasons for it to be introduced today if it wasn't there. But being there already, it would be difficult to argue for removing it, given all the code using it already. – Muhammad Alkarouri Aug 15 '10 at 12:26
1

File under defensive programming. `setdefault` makes explicit that you are assigning to a dict via a key that may or may not exist, and if it does not exist you want it created with a default value: for example `d.setdefault(key,[]).append(value)`. Elsewhere in the program you do `alist=d[k]` where k is computed, and you want an exception thrown if k in not in d (which with a defaultdict might require `assert k in d` or even `if not ( k in d): raise KeyError` – nigel222 Sep 10 '15 at 09:57

xged · Answer 8 · 2018-09-12T09:01:17.470

4

One drawback of defaultdict over dict (dict.setdefault) is that a defaultdict object creates a new item EVERYTIME non existing key is given (eg with ==, print). Also the defaultdict class is generally way less common then the dict class, its more difficult to serialize it IME.

P.S. IMO functions|methods not meant to mutate an object, should not mutate an object.

edited Sep 12 '18 at 09:01

answered Dec 03 '16 at 04:58

xged

1,207
1
14
20

It doesn't have to create a new object every time. You can just as easily do `defaultdict(lambda l=[]: l)` instead. – Artyer Mar 01 '17 at 17:43
13

Never do what @Artyer suggests--mutable defaults will bite you. – Brandon Humpert May 11 '17 at 21:25

AbstProcDo · Answer 9 · 2018-06-17T00:58:57.477

I rewrote the accepted answer and facile it for the newbies.

#break it down and understand it intuitively.
new = {}
for (key, value) in data:
    if key not in new:
        new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
        new[key].append(value)
    else:
        new[key].append(value)


# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # it is new[key] = []
    group.append(value)



# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append(value) # all keys have a default value of empty list []

Additionally,I categorized the methods as reference:

dict_methods_11 = {
            'views':['keys', 'values', 'items'],
            'add':['update','setdefault'],
            'remove':['pop', 'popitem','clear'],
            'retrieve':['get',],
            'copy':['copy','fromkeys'],}

dict_methods_11 = { 'views':['keys', 'values', 'items'], 'add':['update','setdefault'], 'remove':['pop', 'popitem','clear'], 'retrieve':['get'], 'copy':['copy','fromkeys']} There were two extra commas I've edited them. — Ali Hassan, Oct 18 '20 at 16:00

score 3 · Answer 10 · answered Sep 21 '14 at 21:05

Here are some examples of setdefault to show its usefulness:

"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)

# To retrieve a list of the values for a key
list_of_values = d[key]

# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)

# Despite the empty lists, it's still possible to 
# test for the existance of values easily:
if d.has_key(key) and d[key]:
    pass # d has some values for key

# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e

# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])

It is great to see an example of using the return value from `setdefault`. That's a much simpler way of use. — Martlark, Jun 21 '23 at 23:51

score 2 · Answer 11 · answered Aug 25 '15 at 18:42

I use setdefault frequently when, get this, setting a default (!!!) in a dictionary; somewhat commonly the os.environ dictionary:

# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')

Less succinctly, this looks like this:

# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
    os.environ['VENV_DIR'] = '/my/default/path')

It's worth noting that you can also use the resulting variable:

venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')

But that's less necessary than it was before defaultdicts existed.

score 2 · Answer 12 · answered Feb 17 '16 at 20:49

Another use case that I don't think was mentioned above. Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.

return self.objects_by_id.setdefault(obj.id, obj)

That's useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time. For example when object attributes get updated in memory and saving to storage is deferred.

score 1 · Answer 13 · answered Jan 23 '17 at 00:59

One very important use-case I just stumbled across: dict.setdefault() is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).

For example, the (Int)Flag Enum in Python 3.6.0 has a bug: if multiple threads are competing for a composite (Int)Flag member, there may end up being more than one:

from enum import IntFlag, auto
import threading

class TestFlag(IntFlag):
    one = auto()
    two = auto()
    three = auto()
    four = auto()
    five = auto()
    six = auto()
    seven = auto()
    eight = auto()

    def __eq__(self, other):
        return self is other

    def __hash__(self):
        return hash(self.value)

seen = set()

class cycle_enum(threading.Thread):
    def run(self):
        for i in range(256):
            seen.add(TestFlag(i))

threads = []
for i in range(8):
    threads.append(cycle_enum())

for t in threads:
    t.start()

for t in threads:
    t.join()

len(seen)
# 272  (should be 256)

The solution is to use setdefault() as the last step of saving the computed composite member -- if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.

0xack13 · Answer 14 · 2020-11-30T09:09:25.110

In addition to what have been suggested, setdefault might be useful in situations where you don't want to modify a value that has been already set. For example, when you have duplicate numbers and you want to treat them as one group. In this case, if you encounter a repeated duplicate key which has been already set, you won't update the value of that key. You will keep the first encountered value. As if you are iterating/updating the repeated keys once only.

Here's a code example of recording the index for the keys/elements of a sorted list:

nums = [2,2,2,2,2]
d = {}
for idx, num in enumerate(sorted(nums)):
    # This will be updated with the value/index of the of the last repeated key
    # d[num] = idx # Result (sorted_indices): [4, 4, 4, 4, 4]
    # In the case of setdefault, all encountered repeated keys won't update the key.
    # However, only the first encountered key's index will be set 
    d.setdefault(num,idx) # Result (sorted_indices): [0, 0, 0, 0, 0]

sorted_indices = [d[i] for i in nums]

YvesgereY · Answer 15 · 2016-04-12T18:24:38.880

0

[Edit] Very wrong! The setdefault would always trigger long_computation, Python being eager.

Expanding on Tuttle's answer. For me the best use case is cache mechanism. Instead of:

if x not in memo:
   memo[x]=long_computation(x)
return memo[x]

which consumes 3 lines and 2 or 3 lookups, ~~I would happily write~~ :

return memo.setdefault(x, long_computation(x))

edited Apr 12 '16 at 18:24

answered Mar 16 '16 at 09:18

YvesgereY

3,778
1
20
19

Good example. I still think the 3 lines are more comprehensible, but maybe my brain will grow to appreciate setdefault. – Bob Stein Mar 23 '16 at 02:12
5

Those are not equivalent. In the first, `long_computation(x)` is only called if `x not in memo`. Whereas in the second, `long_computation(x)` is always called. Only the assignment is conditional, the equivalent code to `setdefault` would look like: `v = long_computation(x)` / `if x not in memo:` / `memo[x] = v`. – Dan D. Apr 11 '16 at 22:59

score 0 · Answer 16 · answered Apr 24 '17 at 17:47

I like the answer given here:

http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html

In short, the decision (in non-performance-critical apps) should be made on the basis of how you want to handle lookup of empty keys downstream (viz. KeyError versus default value).

score 0 · Answer 17 · answered Sep 17 '17 at 16:38

The different use case for setdefault() is when you don't want to overwrite the value of an already set key. defaultdict overwrites, while setdefault() does not. For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don't want to remove the present sub dictionary. This is when you use setdefault().

Example with defaultdict:

>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})

setdefault doesn't overwrite:

>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}

Matthew Moisen · Answer 18 · 2021-04-23T17:58:00.987

Another usecase for setdefault in CPython is that it is atomic in all cases, whereas defaultdict will not be atomic if you use a default value created from a lambda.

cache = {}

def get_user_roles(user_id):
    if user_id in cache:
        return cache[user_id]['roles']

    cache.setdefault(user_id, {'lock': threading.Lock()})

    with cache[user_id]['lock']:
        roles = query_roles_from_database(user_id)
        cache[user_id]['roles'] = roles

If two threads execute cache.setdefault at the same time, only one of them will be able to create the default value.

If instead you used a defaultdict:

cache = defaultdict(lambda: {'lock': threading.Lock()}

This would result in a race condition. In my example above, the first thread could create a default lock, and the second thread could create another default lock, and then each thread could lock its own default lock, instead of the desired outcome of each thread attempting to lock a single lock.

Conceptually, setdefault basically behaves like this (defaultdict also behaves like this if you use an empty list, empty dict, int, or other default value that is not user python code like a lambda):

gil = threading.Lock()

def setdefault(dict, key, value_func):
    with gil:
        if key not in dict:
            return
       
        value = value_func()

        dict[key] = value

Conceptually, defaultdict basically behaves like this (only when using python code like a lambda - this is not true if you use an empty list):

gil = threading.Lock()

def __setitem__(dict, key, value_func):
    with gil:
        if key not in dict:
            return

    value = value_func()

    with gil:
        dict[key] = value

Use cases for the 'setdefault' dict method

18 Answers18

Linked

Related