59

I have a defaultdict that looks like this:

dict1 = defaultdict(lambda: defaultdict(int))

The problem is, I can't pickle it using cPickle. One of the solution that I found here is to use module-level function instead of a lambda. My question is, what is module-level function? How can I use the dictionary with cPickle?

Fynn Mahoney
  • 699
  • 1
  • 7
  • 10

10 Answers10

76

In addition to Martijn's explanation:

A module-level function is a function which is defined at module level, that means it is not an instance method of a class, it's not nested within another function, and it is a "real" function with a name, not a lambda function.

So, to pickle your defaultdict, create it with module-level function instead of a lambda function:

def dd():
    return defaultdict(int)

dict1 = defaultdict(dd) # dd is a module-level function

than you can pickle it

tmp = pickle.dumps(dict1) # no exception
new = pickle.loads(tmp)
Community
  • 1
  • 1
sloth
  • 99,095
  • 21
  • 171
  • 219
  • this solution threw an error, but `def dd(): return 'something'` `mydict = defaultdict(dd)` worked – joshi123 May 12 '20 at 14:24
  • 1
    You're right @joshi123. In order to get this working its necessary to replace the retunr defaultdict(int) with our real default value. In my case I was returning a string literal. Then 'return 'n''. Thanks! – lwb May 26 '20 at 23:35
  • just realised @Addishiwot Shimels already had this answer below – joshi123 May 28 '20 at 17:32
  • @joshi123 so which error did you get? The code works just fine: https://ideone.com/ZoNKnh. Also, I don't see how Addishiwot Shimels' answer is different than mine, as it does the same: replacing the lambda with a module-level function. – sloth May 29 '20 at 05:55
  • @sloth can't seem to reproduce it now, apologies my mistake. I think it is a bit confusing calling `defaultdict` twice though – joshi123 Jun 01 '20 at 16:19
  • @joshi123 Yeah, I agree that's confusing. – sloth Jun 01 '20 at 18:37
21

Pickle wants to store all the instance attributes, and defaultdict instances store a reference to the default callable. Pickle recurses over each instance attribute.

Pickle cannot handle lambdas; pickle only ever handles data, not code, and lambdas contain code. Functions can be pickled, but just like class definitions only if the function can be imported. A function defined at the module level can be imported. Pickle just stores a string in that case, the full 'path' of the function to be imported and referenced when unpickling again.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
14

You can however use partial to accomplish this:

>>> from collections import defaultdict
>>> from functools import partial
>>> pickle.loads(pickle.dumps(defaultdict(partial(defaultdict, int))))
defaultdict(<functools.partial object at 0x94dd16c>, {})
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • @Fred It's basically just a `defaultdict` where the default value . is a `defaultdict(int)`. Thhe code is demonstrating that it can be succeessfully pickled – jamylak Feb 13 '19 at 01:25
7

To do this, just write the code you wanted to write. I'd use dill, which can serialize lambdas and defaultdicts. Dill can serialize almost anything in python.

>>> import dill
>>> from collections import defaultdict
>>>
>>> dict1 = defaultdict(lambda: defaultdict(int))
>>> pdict1 = dill.dumps(dict1)
>>> _dict1 = dill.loads(pdict1)
>>> _dict1
defaultdict(<function <lambda> at 0x10b31b398>, {})
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • This works well. Is there a way to dump dict1 in a temp file and then load it back again? Something similar to the pickle operation of writing and reading from files.. – Hypothetical Ninja Sep 07 '14 at 06:15
  • 1
    Sure. `dill` provides the usual `dump` and `load` that can be used just like `dump` and `load` from `pickle`. Additionally, you might want to check out `dill.temp.dump` which dumps to a `NamedTemporaryFile`. – Mike McKerns Sep 07 '14 at 12:33
  • Thanks, check out the latest question on my profile . You could post your answer there. :) – Hypothetical Ninja Sep 07 '14 at 14:21
7

Solution that still works as a one-liner for this case, and is actually more efficient than the lambda (or an equivalent def-ed) function to boot:

dict1 = defaultdict(defaultdict(int).copy)

That just makes a template defaultdict(int), and binds its copy method as the default factory for the outer defaultdict. Everything in there is picklable, and on CPython (where defaultdict is a built-in type implemented in C) it's more efficient than invoking any user-defined function to do the same job. No need for extra imports, wrapping, etc.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • 1
    Elegant solution – abhinonymous Dec 03 '20 at 00:26
  • Beautiful idea! – Oliver Baumann Sep 29 '21 at 16:14
  • 1
    @OliverBaumann: Thanks! As it happens, the comment about performance no longer applies (see [update to answer and comments here](https://stackoverflow.com/a/35759455/364696)), though that's possibly a temporary issue (they optimized the code paths affecting `lambda`s; the code paths for `defaultdict(int).copy` could be optimized further and should be able to beat a `lambda` if this is done). It's still nice as a `pickle` friendly one-liner though. – ShadowRanger Sep 29 '21 at 18:33
4
dict1 = defaultdict(lambda: defaultdict(int))
cPickle.dump(dict(dict1), file_handle)

worked for me

Avi
  • 107
  • 1
  • 8
3

Implementing the anonymous lambda function by a normal function worked for me. As pointed out by Mike, Pickle cannot handle lambdas; pickle only handles data. Hence, converting the defaultdict method from:

    dict_ = defaultdict(lambda: default_value)

to:

    def default_():
        return default_value

and then creating the default dict as follows worked for me:

    dict_ = defaultdict(default_)
  • I don't see what this adds to [sloth's answer from six years before](https://stackoverflow.com/a/16439720/364696)... – ShadowRanger Nov 01 '20 at 03:25
  • 1
    @ShadowRanger Given the evolution of languages over the years, there is *some* value in knowing that what used to hold six years ago still holds today. Perhaps this could be made explicit in the answer. – Salmonstrikes Oct 22 '21 at 10:39
2

If you don't care about preserving the defaultdict type, convert it:

fname = "file.pkl"

for value in nested_default_dict:
    nested_default_dict[value] = dict(nested_default_dict[value])
my_dict = dict(nested_default_dict)

with open(fname, "wb") as f:
    pickle.dump(my_dict, f)  # Now this will work

I think this is a great alternative since when you are pickling, the object is probably in it's final form... AND, if really do need the defaultdict type again, you can simply convert is back after you unpickle:

for value in my_dict:
    my_dict[value] = defaultdict(type, my_dict[value])
nested_default_dict = defaultdict(type, my_dict)
birdmw
  • 865
  • 10
  • 18
1

I'm currently doing something similar to the question poser, however, I'm using a subclass of defaultdict which has a member function that is used as the default_factory. In order to have my code work properly (I required the function to be defined at runtime), I simply added some code to prepare the object for pickling.

Instead of:

...
pickle.dump(dict, file)
...

I use this:

....
factory = dict.default_factory
dict.default_factory = None
pickle.dump(dict, file)
dict.default_factory = factory
...

This isn't the exact code I used as my tree is an object which creates instances of the same the tree's type as indexes are requested (so I use a recursive member function to do the pre/post pickle operations), but this pattern also answers the question.

Sandy Chapman
  • 11,133
  • 3
  • 58
  • 67
  • Note that this is only good if you don't care to lose the `default_factory` of the pickled dict. If you don't need the factory any more, you can simply set it to `None` and be done (: – drevicko Sep 19 '14 at 03:41
0

Here is a function for an arbitrary base defaultdict for an arbitrary depth of nesting.

def wrap_defaultdict(instance, times):
    """Wrap an instance an arbitrary number of `times` to create nested defaultdict.
    
    Parameters
    ----------
    instance - e.g., list, dict, int, collections.Counter
    times - the number of nested keys above `instance`; if `times=3` dd[one][two][three] = instance
    
    Notes
    -----
    using `x.copy` allows pickling (loading to ipyparallel cluster or pkldump)
        - thanks https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict
    """
    from collections import defaultdict

    def _dd(x):
        return defaultdict(x.copy)

    dd = defaultdict(instance)
    for i in range(times-1):
        dd = _dd(dd)

    return dd
BML
  • 191
  • 2
  • 12