141

I understand the difference between copy vs. deepcopy in the copy module. I've used copy.copy and copy.deepcopy before successfully, but this is the first time I've actually gone about overloading the __copy__ and __deepcopy__ methods. I've already Googled around and looked through the built-in Python modules to look for instances of the __copy__ and __deepcopy__ functions (e.g. sets.py, decimal.py, and fractions.py), but I'm still not 100% sure I've got it right.

Here's my scenario:

I have a configuration object. Initially, I'm going to instantiate one configuration object with a default set of values. This configuration will be handed off to multiple other objects (to ensure all objects start with the same configuration). However, once user interaction starts, each object needs to tweak its configurations independently without affecting each other's configurations (which says to me I'll need to make deepcopys of my initial configuration to hand around).

Here's a sample object:

class ChartConfig(object):

    def __init__(self):

        #Drawing properties (Booleans/strings)
        self.antialiased = None
        self.plot_style = None
        self.plot_title = None
        self.autoscale = None

        #X axis properties (strings/ints)
        self.xaxis_title = None
        self.xaxis_tick_rotation = None
        self.xaxis_tick_align = None

        #Y axis properties (strings/ints)
        self.yaxis_title = None
        self.yaxis_tick_rotation = None
        self.yaxis_tick_align = None

        #A list of non-primitive objects
        self.trace_configs = []

    def __copy__(self):
        pass

    def __deepcopy__(self, memo):
        pass 

What is the right way to implement the copy and deepcopy methods on this object to ensure copy.copy and copy.deepcopy give me the proper behavior?

Eino Gourdin
  • 4,169
  • 3
  • 39
  • 67
Brent Writes Code
  • 19,075
  • 7
  • 52
  • 56
  • Does it work? Are there problems? – Ned Batchelder Sep 30 '09 at 21:33
  • I thought I was still getting problems with shared references, but it's entirely possible I messed up elsewhere. I'll double check based on @MortenSiebuhr's post when I get a chance and update with the results. – Brent Writes Code Sep 30 '09 at 21:44
  • From my currently limited understanding I would expect copy.deepcopy(ChartConfigInstance) to return a new instance which wouldn't have any shared references with the original (without reimplementing deepcopy yourself). Is this incorrect? – emschorsch Aug 11 '15 at 04:37

10 Answers10

136

Putting together Alex Martelli's answer and Rob Young's comment you get the following code:

from copy import copy, deepcopy

class A(object):
    def __init__(self):
        print 'init'
        self.v = 10
        self.z = [2,3,4]

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, deepcopy(v, memo))
        return result

a = A()
a.v = 11
b1, b2 = copy(a), deepcopy(a)
a.v = 12
a.z.append(5)
print b1.v, b1.z
print b2.v, b2.z

prints

init
11 [2, 3, 4, 5]
11 [2, 3, 4]

here __deepcopy__ fills in the memo dict to avoid excess copying in case the object itself is referenced from its member.

Antony Hatchkins
  • 31,947
  • 10
  • 111
  • 111
  • 1
    I think `__deepcopy__` should include a test to avoid infinite recursion: d = id(self) result = memo.get(d, None) if result is not None: return result – Antonín Hoskovec Jan 21 '19 at 09:33
  • 2
    @AntonyHatchkins It's not immediately clear from your post _where_ `memo[id(self)]` actually gets used to prevent infinite recursion. I have put together a [short example](https://pyfiddle.io/fiddle/8352e97e-ca12-4479-afd2-05cfc431a80e/?i=true) which suggests that `copy.deepcopy()` internally aborts the call to an object if its `id()` is a key of `memo`, correct? It is also worth noting that `deepcopy()` seems to do this on its own _by default_, which makes it hard to imagine a case where defining `__deepcopy__` manually is actually needed... – Jonathan H Mar 31 '19 at 12:34
  • Is it useful to do `memo[id(self)]` for a mutable object? I thing `memo` is useful only if the object have an hash, so `try: memo[self]` seems to me better. – Marco Sulla Oct 20 '20 at 17:55
  • @MarcoSulla That does not work because `copy.deepcopy(obj, memo)` checks for `memo[id(obj)]`, so you have to use `id(self)`. – Holt Oct 31 '20 at 10:36
  • Shouldn't `b1` print `12 [2, 3, 4, 5]`? If `copy` is intended to be `shallow`, `b1` should be the same as `a`...? – adam.hendry Aug 31 '22 at 19:22
93

The recommendations for customizing are at the very end of the docs page:

Classes can use the same interfaces to control copying that they use to control pickling. See the description of module pickle for information on these methods. The copy module does not use the copy_reg registration module.

In order for a class to define its own copy implementation, it can define special methods __copy__() and __deepcopy__(). The former is called to implement the shallow copy operation; no additional arguments are passed. The latter is called to implement the deep copy operation; it is passed one argument, the memo dictionary. If the __deepcopy__() implementation needs to make a deep copy of a component, it should call the deepcopy() function with the component as first argument and the memo dictionary as second argument.

Since you appear not to care about pickling customization, defining __copy__ and __deepcopy__ definitely seems like the right way to go for you.

Specifically, __copy__ (the shallow copy) is pretty easy in your case...:

def __copy__(self):
  newone = type(self)()
  newone.__dict__.update(self.__dict__)
  return newone

__deepcopy__ would be similar (accepting a memo arg too) but before the return it would have to call self.foo = deepcopy(self.foo, memo) for any attribute self.foo that needs deep copying (essentially attributes that are containers -- lists, dicts, non-primitive objects which hold other stuff through their __dict__s).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • I think I have heard that it's better to override `__getstate__`/`__setstate__` to implement copying. Or am I confused? – u0b34a0f6ae Sep 30 '09 at 23:58
  • 1
    @kaizer, they're fine to customize pickling/unpickling as well as copying, but if you don't care about the pickling, it's simpler and more direct to use `__copy__`/`__deepcopy__`. – Alex Martelli Oct 01 '09 at 01:07
  • 4
    That doesn't seem to be a direct translation of copy/deepcopy. Neither copy nor deepcopy call the constructor of the object being copied. Consider this example. class Test1(object): def __init__(self): print "%s.%s" % (self.__class__.__name__, "__init__") class Test2(Test1): def __copy__(self): new = type(self)() return new t1 = Test1() copy.copy(t1) t2 = Test2() copy.copy(t2) – Rob Young Jun 27 '11 at 18:01
  • Oh, well that was a bit of a fail, looks like you can't put code in comments. Well, it's just showing that with Test1 the constructor only gets called once, while with Test2 it gets called twice. – Rob Young Jun 27 '11 at 18:03
  • 12
    I think instead of type(self)(), you should use cls = self.__class__; cls.__new__(cls) to be insensitive to constructors interface (especially for subclassing). It is not really important here however. – Juh_ Mar 05 '13 at 09:25
  • 14
    Why `self.foo = deepcopy(self.foo, memo)`... ? Don't you really mean `newone.foo = ...`? – Alois Mahdal Sep 13 '13 at 13:36
  • 4
    @Juh_'s comment is spot on. You don't want to call `__init__`. That's not what copy does. Also there is very often a use case where pickling and copying need to be different. In fact, I don't even know why copy tries to use the pickling protocol by default. Copying is for in-memory manipulation, pickling is for cross-epoch persistence; they are completely different things that bear little relation to each other. – Nimrod Aug 10 '16 at 20:45
  • 1
    @AloisMahdal Unless I'm missing something, it doesn't really matter whether you give the current instance or the new instance the copy. Either way you end up with two separate, identical objects, one referenced from each instance. I agree `newone.foo` is a lot more intuitive, though. – Soren Bjornstad Jul 15 '18 at 01:19
  • Your quote is for python 2.7. For python 3.5+ and perhaps earlier, the copy module _does_ use the registered reducers from copyreg module. – Azmisov Sep 18 '20 at 18:45
21

Following Peter's excellent answer, to implement a custom deepcopy, with minimal alteration to the default implementation (e.g. just modifying a field like I needed) :

class Foo(object):
    def __deepcopy__(self, memo):
        deepcopy_method = self.__deepcopy__
        self.__deepcopy__ = None
        cp = deepcopy(self, memo)
        self.__deepcopy__ = deepcopy_method
        cp.__deepcopy__ = deepcopy_method

        # custom treatments
        # for instance: cp.id = None

        return cp

Edit: a limitation of this approach, as Igor Kozyrenko points out, is that the copies' __deepcopy__ will still be bound to the original object, so a copy of a copy will actually be a copy of the original. There's perhaps a way to re-bind the __deepcopy__ to cp, instead of just assigning it with cp.__deepcopy__ = deepcopy_method

Eino Gourdin
  • 4,169
  • 3
  • 39
  • 67
  • 1
    is this preferred to using `delattr(self, '__deepcopy__')` then `setattr(self, '__deepcopy__', deepcopy_method)`? – joel Jul 15 '20 at 19:12
  • According to [this answer](https://stackoverflow.com/a/12801950/150015), both are equivalent ; but setattr is more useful when setting an attribute whose name is dynamic / not known at coding time. – Eino Gourdin Jul 24 '20 at 06:40
  • 3
    This is my personal fave and I'm using it in production where an object has a logger, which then has a thread lock, which cannot be pickled. Save off the logger, set it to `None`, call the default for everything else, and then put it back. Future-proof because I don't need to worry about forgetting to handle a field, and inherited classes "just work." – Aaron D. Marasco Oct 27 '20 at 13:55
  • 1
    BTW I tried the `delattr()` one and it failed in Python2.7 with `AttributeError`. The "set it to `None`" is what I've been using. – Aaron D. Marasco Oct 27 '20 at 15:07
  • 1
    Wonderful- useful for making deep copies of PyTorch nn.Modules with custom attributes. – eric.mitchell Jun 11 '21 at 17:16
  • 2
    @EinoGourdin `deepcopy_method = self.__deepcopy__` is creating a reference bound to `self` and then both objects are getting it instead of unbound version from the class itself. This will make all copies made from any other copies to be actually always made from the original object. And original object is never deleted unless all copies are deleted. – Igor Kozyrenko Aug 12 '21 at 10:57
  • 1
    @joel @EinoGourdin to avoid always copying the first object this approach might be used: `self.__deepcopy__ = None; cp = deepcopy(self, memo); delattr(self, "__deepcopy__"); delattr(cp, "__deepcopy__")`. Maybe with additional check in case self had `__deepcopy__` in it's instance dict but I'm not sure what meaningful can be done in that case. – Igor Kozyrenko Aug 12 '21 at 11:16
12

Its not clear from your problem why you need to override these methods, since you don't want to do any customization to the copying methods.

Anyhow, if you do want to customize the deep copy (e.g. by sharing some attributes and copying others), here is a solution:

from copy import deepcopy


def deepcopy_with_sharing(obj, shared_attribute_names, memo=None):
    '''
    Deepcopy an object, except for a given list of attributes, which should
    be shared between the original object and its copy.

    obj is some object
    shared_attribute_names: A list of strings identifying the attributes that
        should be shared between the original and its copy.
    memo is the dictionary passed into __deepcopy__.  Ignore this argument if
        not calling from within __deepcopy__.
    '''
    assert isinstance(shared_attribute_names, (list, tuple))
    shared_attributes = {k: getattr(obj, k) for k in shared_attribute_names}

    if hasattr(obj, '__deepcopy__'):
        # Do hack to prevent infinite recursion in call to deepcopy
        deepcopy_method = obj.__deepcopy__
        obj.__deepcopy__ = None

    for attr in shared_attribute_names:
        del obj.__dict__[attr]

    clone = deepcopy(obj)

    for attr, val in shared_attributes.iteritems():
        setattr(obj, attr, val)
        setattr(clone, attr, val)

    if hasattr(obj, '__deepcopy__'):
        # Undo hack
        obj.__deepcopy__ = deepcopy_method
        del clone.__deepcopy__

    return clone



class A(object):

    def __init__(self):
        self.copy_me = []
        self.share_me = []

    def __deepcopy__(self, memo):
        return deepcopy_with_sharing(self, shared_attribute_names = ['share_me'], memo=memo)

a = A()
b = deepcopy(a)
assert a.copy_me is not b.copy_me
assert a.share_me is b.share_me

c = deepcopy(b)
assert c.copy_me is not b.copy_me
assert c.share_me is b.share_me
Peter
  • 12,274
  • 9
  • 71
  • 86
  • 1
    Doesn't the clone also need it's `__deepcopy__` method reset since it will have `__deepcopy__` = None? – flutefreak7 Apr 06 '17 at 21:20
  • 2
    Nope. If `__deepcopy__` method is not found (or `obj.__deepcopy__` returns None), then `deepcopy` falls back on the standard deep-copying function. This can be seen [here](https://github.com/python/cpython/blob/3.6/Lib/copy.py#L159) – Peter Apr 07 '17 at 13:01
  • 1
    But then b won't have the ability to deepcopy with sharing? c = deepcopy(a) would be different from d=deepcopy(b) because d would be a default deepcopy where c would have some shared attrs with a. – flutefreak7 Apr 07 '17 at 13:07
  • 1
    Ah, now I see what you're saying. Good point. I fixed it, I think, by deleting the fake `__deepcopy__=None` attribute from the clone. See new code. – Peter Apr 10 '17 at 09:17
  • 2
    maybe clear to the python experts: if you use this code in python 3, change " for attr, val in shared_attributes.iteritems():" with " for attr, val in shared_attributes.items():" – complexM Jan 21 '18 at 14:24
  • 1
    Peter - can you explain why removing the mem from the line clone = deepcopy(obj) and adding del clone.__deepcopy__ will solve flutefreak7 comment ? how clone will be able to call __deepcopy__ if it is deleted ? – yehudahs May 12 '19 at 06:44
7

I might be a bit off on the specifics, but here goes;

From the copy docs;

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

In other words: copy() will copy only the top element and leave the rest as pointers into the original structure. deepcopy() will recursively copy over everything.

That is, deepcopy() is what you need.

If you need to do something really specific, you can override __copy__() or __deepcopy__(), as described in the manual. Personally, I'd probably implement a plain function (e.g. config.copy_config() or such) to make it plain that it isn't Python standard behaviour.

Morten Siebuhr
  • 6,068
  • 4
  • 31
  • 43
  • 4
    *In order for a class to define its own copy implementation, it can define special methods `__copy__(`) and `__deepcopy__()`.* http://docs.python.org/library/copy.html – SilentGhost Sep 30 '09 at 21:42
  • I'll double-check my code, thanks. I'm going to feel dumb if this was a simple bug elsewhere :-P – Brent Writes Code Sep 30 '09 at 21:45
  • @MortenSiebuhr You are correct. I wasn't entirely clear that copy/deepcopy would do anything by default without me overriding those functions. I was looking for actual code though that I can tweak later (e.g. if I don't want to copy all attributes), so I gave you an up-vote but I'm going to go with @AlexMartinelli's answer. Thanks! – Brent Writes Code Oct 01 '09 at 01:23
3

Building on Antony Hatchkins' clean answer, here's my version where the class in question derives from another custom class (s.t. we need to call super):

class Foo(FooBase):
    def __init__(self, param1, param2):
        self._base_params = [param1, param2]
        super(Foo, result).__init__(*self._base_params)

    def __copy__(self):
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        super(Foo, result).__init__(*self._base_params)
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, copy.deepcopy(v, memo))
        super(Foo, result).__init__(*self._base_params)
        return result
BoltzmannBrain
  • 5,082
  • 11
  • 46
  • 79
3

The copy module uses eventually the __getstate__()/__setstate__() pickling protocol, so these are also valid targets to override.

The default implementation just returns and sets the __dict__ of the class, so you don't have to call super() and worry about Eino Gourdin's clever trick, above.

joel
  • 6,359
  • 2
  • 30
  • 55
ankostis
  • 8,579
  • 3
  • 47
  • 61
3

Peter's and Eino Gourdin's answers are clever and useful, but they have a very subtle bug!

Python methods are bound to their object. When you do cp.__deepcopy__ = deepcopy_method, you are actually giving the object cp a reference to __deepcopy__ on the original object. Any calls to cp.__deepcopy__ will return a copy of the original! If you deepcopy your object and then deepcopy that copy, the output is a NOT a copy of the copy!

Here's a minimal example of the behavior, along with my fixed implementation where you copy the __deepcopy__ implementation and then bind it to the new object:

from copy import deepcopy
import types


class Good:
    def __init__(self):
        self.i = 0

    def __deepcopy__(self, memo):
        deepcopy_method = self.__deepcopy__
        self.__deepcopy__ = None
        cp = deepcopy(self, memo)
        self.__deepcopy__ = deepcopy_method
        # Copy the function object
        func = types.FunctionType(
            deepcopy_method.__code__,
            deepcopy_method.__globals__,
            deepcopy_method.__name__,
            deepcopy_method.__defaults__,
            deepcopy_method.__closure__,
        )
        # Bind to cp and set
        bound_method = func.__get__(cp, cp.__class__)
        cp.__deepcopy__ = bound_method

        return cp


class Bad:
    def __init__(self):
        self.i = 0

    def __deepcopy__(self, memo):
        deepcopy_method = self.__deepcopy__
        self.__deepcopy__ = None
        cp = deepcopy(self, memo)
        self.__deepcopy__ = deepcopy_method
        cp.__deepcopy__ = deepcopy_method
        return cp


x = Bad()
copy = deepcopy(x)
copy.i = 1
copy_of_copy = deepcopy(copy)
print(copy_of_copy.i)  # 0

x = Good()
copy = deepcopy(x)
copy.i = 1
copy_of_copy = deepcopy(copy)
print(copy_of_copy.i)  # 1
Zach Price
  • 31
  • 1
2

Similar with Zach Price's thoughts, there is a simpler way to achieve that goal, i.e. unbind the original __deepcopy__ method then bind it to cp

from copy import deepcopy
import types


class Good:
    def __init__(self):
        self.i = 0

    def __deepcopy__(self, memo):
        deepcopy_method = self.__deepcopy__
        self.__deepcopy__ = None
        cp = deepcopy(self, memo)
        self.__deepcopy__ = deepcopy_method
        
        # Bind to cp by types.MethodType
        cp.__deepcopy__ = types.MethodType(deepcopy_method.__func__, cp)

        return cp
NeverMore
  • 31
  • 2
1

I came here for performance reasons. Using the default copy.deepcopy() function was slowing down my code by up to 30 times. Using the answer by @Anthony Hatchkins as a starting point, I realized that copy.deepcopy() is really slow for e.g. lists. I replaced the setattr loop with simple [:] slicing to copy whole lists. For anyone concerned with performance it is worthwhile doing timeit.timeit() comparisons and replacing the calls to copy.deepcopy() by faster alternatives.

setup = 'import copy; l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]'
timeit.timeit(setup = setup, stmt='m=l[:]')
timeit.timeit(setup = setup, stmt='m=l.copy()')
timeit.timeit(setup = setup, stmt='m=copy.deepcopy(l)')

will give these results:

0.11505379999289289
0.09126630000537261
6.423627900003339
eltings
  • 57
  • 6