63

I was checking out Peter Norvig's code on how to write simple spell checkers. At the beginning, he uses this code to insert words into a dictionary.

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

What is the difference between a Python dict and the one that was used here? In addition, what is the lambda for? I checked the API documentation here and it says that defaultdict is actually derived from dict but how does one decide which one to use?

Legend
  • 113,822
  • 119
  • 272
  • 400
  • 1
    What happens if you try that above code using `model = {}` (which is an ordinary dict)? – Greg Hewgill Jul 05 '11 at 23:03
  • 2
    A `defaultdict` allows you to specify a function that will generate the default value if a key in the dictionary doesn't exist. – Jeff Mercado Jul 05 '11 at 23:03
  • 1
    @Greg Hwegill: Yes. It will generate a key error but I can get around it by using `setdefault` right? Please correct me if I am wrong. Also, could you tell me what the lambda is being used for? – Legend Jul 05 '11 at 23:04
  • 1
    `dict` and `collections.defaultdict` are both very completely defined in the documentation. What **specific** questions do you have about the actual words used in the actual documentation? It seems clear to us. Can you provide some hint as to what is not clear to you? – S.Lott Jul 06 '11 at 00:46

4 Answers4

65

The difference is that a defaultdict will "default" a value if that key has not been set yet. If you didn't use a defaultdict you'd have to check to see if that key exists, and if it doesn't, set it to what you want.

The lambda is defining a factory for the default value. That function gets called whenever it needs a default value. You could hypothetically have a more complicated default function.

Help on class defaultdict in module collections:

class defaultdict(__builtin__.dict)
 |  defaultdict(default_factory) --> dict with default factory
 |  
 |  The default factory is called without arguments to produce
 |  a new value when a key is not present, in __getitem__ only.
 |  A defaultdict compares equal to a dict with the same items.
 |  

(from help(type(collections.defaultdict())))

{}.setdefault is similar in nature, but takes in a value instead of a factory function. It's used to set the value if it doesn't already exist... which is a bit different, though.

Donald Miner
  • 38,889
  • 8
  • 95
  • 118
24

Courtesy :- https://shirishweb.wordpress.com/2017/05/06/python-defaultdict-versus-dict-get/

Using Normal dict

d={}
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes'])# This gives Key Error

We can avoid this KeyError by using defaulting in normal dict as well, let see how we can do it

d={}
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d.get('Apple'))
print(d.get('Grapes',0)) # DEFAULTING

Using default dict

from collections import defaultdict
d = defaultdict(int) ## inside parenthesis we say what should be the default value.
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes']) ##→ This gives Will not give error

Using an user defined function to default the value

from collections import defaultdict
def mydefault():
        return 0

d = defaultdict(mydefault)
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes'])

Summary

  1. Defaulting in normal dict is on case to case basis and in defaultdict we can provide default in general manner

  2. Efficiency of using defaulting by defaultdict is two time greater than defaulting with normal dict. You can refer below link to know better on this performance testing https://shirishweb.wordpress.com/2017/05/06/python-defaultdict-versus-dict-get/

sakeesh
  • 919
  • 1
  • 10
  • 24
11

Use a defaultdict if you have some meaningful default value for missing keys and don't want to deal with them explicitly.

The defaultdict constructor takes a function as a parameter and constructs a value using that function.

lambda: 1

is the same as the parameterless function f that does this

def f():
 return 1

I forgot the reason the API was designed this way instead of taking a value as a parameter. If I designed the defaultdict interface, it would be slightly more complicated, the missing value creation function would take the missing key as a parameter.

Rob Neuhaus
  • 9,190
  • 3
  • 28
  • 37
  • 1
    If you took the value as a parameter you'd have to be careful about mutable values. e.g. `defaultdict([])` would set the same (mutable) list as the value for every missing element, whereas `defaultdict(list)` always creates a new one – Ismail Badawi Jul 05 '11 at 23:09
  • 2
    I think the reason that `defaultdict`'s factory function takes no parameters is so it can be used with types whose `__init__()` constructors don't require any -- such as `int`, `list`, and `dict`. You can, of course, easily derive a subclass from `defaultdict` whose `__missing__()` method _does_ pass the key to the factory function. See the answer to [Is there a clever way to pass the key to defaultdict's default_factory?](http://stackoverflow.com/questions/2912231/is-there-a-clever-way-to-pass-the-key-to-defaultdicts-default-factory). – martineau Jan 04 '12 at 18:48
10

Let's deep dive into Python dictionary and Python defaultdict() class

Python Dictionaries

Dict is one of the data structures available in Python which allows data to be stored in the form of key-value pairs.

Example:

d = {'a': 2, 'b': 5, 'c': 6}

Problem with Dictionary

Dictionaries work well unless you encounter missing keys. Suppose you are looking for a key-value pair where there is no value in the dictionary - then you might encounter a KeyError problem. Something like this:

d = {'a': 2, 'b': 5, 'c': 6}
d['z']  # z is not present in dict so it will throw a error

You will see something like this:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
    d['z'] 
KeyError: 'z'

Solution to the above problem

To overcome the above problem we can use different ways:

Using inbuilt functions

setdefault

If the key is in the dictionary, return its value. If not, insert a key with a value of default and return default. default defaults to None:

>>> d = {'a' :2, 'b': 5, 'c': 6}
>>> d.setdefault('z', 0)
0  # returns 0 
>>> print(d)  # add z to the dictionary
{'a': 2, 'b': 5, 'c': 6, 'z': 0}

get

Return the value for key if the key is in the dictionary, else default. If the default is not given, it defaults to None, so that this method never raises a KeyError:

>>> d = {'a': 2, 'b': 5, 'c': 6}
>>> d.get('z', 0)
0  # returns 0 
>>> print(d)  # Doesn't add z to the dictionary unlike setdefault
{'a': 2, 'b': 5, 'c': 6}

The above 2 methods are the solutions to our problem. It never raises KeyError. Apart from the above 2 methods, Python also has a collections module that can handle this problem. Let's dig deep into the defaultdict in the collections module:

defaultdict

defaultdict can be found in the collections module of Python. You can use it using:

from collections import defaultdict

d = defaultdict(int)

defaultdict constructor takes default_factory as an argument that is a callable. This can be for example:

  • int: default will be an integer value of 0

  • str: default will be an empty string ""

  • list: default will be an empty list []

Code:

from collections import defaultdict

d = defaultdict(list)
d['a']  # access a missing key and returns an empty list
d['b'] = 1 # add a key-value pair to dict
print(d)

output will be defaultdict(<class 'list'>, {'b': 1, 'a': []})

The defaultdict works the same as the get() and setdefault() methods, so when to use them?

When to use get()

If you specifically need to return a certain key-value pair without KeyError and also it should not update in the dictionary - then dict.get is the right choice for you. It returns the default value specified by you but does not modify the dictionary.

When to use setdefault()

If you need to modify the original dictionary with a default key-value pair - then setdefault is the right choice.

When to use defaultdict

setdefault method can be achieved using defaultdict but instead of providing default value every time in setdefault, we can do it at once in defaultdict. Also, setdefault has a choice of providing different default values for the keys. Both have their own advantages depending on the use case.

When it comes to efficiency:

defaultdict > setdefault() or get()

defaultdict is 2 times faster than get()!

You can check the results here.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Manu Manoj
  • 189
  • 2
  • 6
  • 2
    Very nice detailed answer! One problem is ion the `defaultdict` example you give. You create a `defaultdict(list)` but then do `d['b'] = 1`. This loses the point of using a defaultdict. The more idiomatic usecase is when we want to append something to a list without checking if the key already exists. So I would change that example to `d['b'].append(1)` and show that it becomes a list `[1]` – Tomerikoo Dec 30 '21 at 18:51
  • @Tomerikoo It's a very good example and suits well with the above scenario and thanks for the edits. – Manu Manoj Dec 31 '21 at 04:41