584

I would like to make a deep copy of a dict in python. Unfortunately the .deepcopy() method doesn't exist for the dict. How do I do that?

>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = my_dict.deepcopy()
Traceback (most recent calll last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'deepcopy'
>>> my_copy = my_dict.copy()
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
7

The last line should be 3.

I would like that modifications in my_dict don't impact the snapshot my_copy.

How do I do that? The solution should be compatible with Python 3.x.

Olivier Grégoire
  • 33,839
  • 23
  • 96
  • 137
  • 3
    I don't know if it's a duplicate, but this: http://stackoverflow.com/questions/838642/python-dictionary-deepcopy is awfully close. – charleslparker Mar 14 '13 at 19:40

4 Answers4

812

How about:

import copy
d = { ... }
d2 = copy.deepcopy(d)

Python 2 or 3:

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import copy
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = copy.deepcopy(my_dict)
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
3
>>>
James Bradbury
  • 1,708
  • 1
  • 19
  • 31
Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • 21
    Indeed that works for the oversimplified example I gave. My keys are not numbers but objects. If I read the copy module documentation, I have to declare a __copy__()/__deepcopy__() method for the keys. Thank you very much for leading me there! – Olivier Grégoire Feb 24 '11 at 14:17
  • 5
    Is there any difference in Python 3.2 and 2.7 codes? They seem identical to me. If so, would be better a single block of code and a statement "Works for both Python 3 and 2" – MestreLion Jun 07 '14 at 03:59
  • 71
    It's also worth mentioning `copy.deepcopy` isn't thread safe. Learned this the hard way. On the other hand, depending on your use case, `json.loads(json.dumps(d))` _is_ thread safe, and works well. – rob Feb 07 '17 at 22:11
  • 11
    @rob you should post that comment as an answer. It is a viable alternative. The thread safety nuance is an important distinction. – BuvinJ Oct 23 '17 at 17:29
  • Of course, I guess one could also wrap `copy.deepcopy` in a homegrown thread safe function? – BuvinJ Oct 23 '17 at 17:30
  • 9
    @BuvinJ The issue is that `json.loads` doesn't solve the problem for all use cases where python `dict` attributes are not JSON serializable. It may help those who are only dealing with simple data structures, from an API for example, but I don't think it's enough of a solution to fully answer the OP's question. – rob Oct 24 '17 at 15:38
  • @rob, I understand the limitation. Thanks for posting your comment. You may have saved me some trouble. – BuvinJ Oct 24 '17 at 15:43
  • 2
    another fun fact: deepcopy of dict does not always preserve the order of elements: `d = dict() ; for n in range(8): d[(n,n+1)] = n ; assert d.keys() == copy.deepcopy(d).keys() ` – Francois Jan 03 '18 at 17:07
  • @rob Please can you convince me that JSON.dumps is threadsafe? This answer seems to contradict that hypothesis: https://stackoverflow.com/a/23781650 – Robino Jan 06 '18 at 15:07
  • 1
    What does the lack of thread safety mean here? i.e., would it be safe to copy `my_dict` from multiple threads, as long as `my_dict` isn't modified in the process? – kbolino Feb 05 '18 at 20:07
  • 2
    "thread safety" is a rather meaningless term, unless you use its opposite, "not thread safe". Thread safety implies expectations of behavior, but it doesn't detail *which* behavior. The thread safety a library author has in mind might not be the same a library user has in mind. For instance, is it enough that a collection doesn't corrupt itself if used on multiple threads? Do you need a check against `count > 0` followed by an implicit read of the first item to be consistent? In general, *reads* are thread safe as long as reading something doesn't have side-effects. – Lasse V. Karlsen Feb 06 '18 at 08:15
  • `json.loads(json.dumps(d))` makes integer key to str `{1: {2: 3}, 3: [1, 2, 3]}` -> `{'1': {'2': 3}, '3': [1, 2, 3]}` – 구마왕 Feb 14 '22 at 06:15
71

dict.copy() is a shallow copy function for dictionary
id is built-in function that gives you the address of variable

First you need to understand "why is this particular problem is happening?"

In [1]: my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}

In [2]: my_copy = my_dict.copy()

In [3]: id(my_dict)
Out[3]: 140190444167808

In [4]: id(my_copy)
Out[4]: 140190444170328

In [5]: id(my_copy['a'])
Out[5]: 140190444024104

In [6]: id(my_dict['a'])
Out[6]: 140190444024104

The address of the list present in both the dicts for key 'a' is pointing to same location.
Therefore when you change value of the list in my_dict, the list in my_copy changes as well.


Solution for data structure mentioned in the question:

In [7]: my_copy = {key: value[:] for key, value in my_dict.items()}

In [8]: id(my_copy['a'])
Out[8]: 140190444024176

Or you can use deepcopy as mentioned above.

theBuzzyCoder
  • 2,652
  • 2
  • 31
  • 26
  • 10
    Your solution doesn't work for nested dictionaries. deepcopy is preferable for that reason. – Charles Plager Apr 18 '18 at 18:03
  • 3
    @CharlesPlager Agreed! But you should also notice that list slicing doesn't work on dict `value[:]`. The solution was for the particular data structure mentioned in the question rather than a universal solution. – theBuzzyCoder Apr 23 '18 at 10:06
  • 2nd solution for dict comprehension would not work with nested lists as values ``` >>> my_copy = {k:v.copy() for k,v in my_dict.items()} >>> id(my_copy["a"]) 4427452032 >>> id(my_dict["a"]) 4427460288 >>> id(my_copy["a"][0]) 4427469056 >>> id(my_dict["a"][0]) 4427469056 ``` – Aseem Jain Mar 04 '21 at 03:27
  • This is the bets solution for people who do not want to or can not use extra libraries. I am using python in LaTex environment, due to which I could not afford to use a single extra library. So, this solution works like a charm for me. Thanks @theBuzzyCoder – Asif Mehmood Dec 20 '22 at 03:58
63

Python 3.x

from copy import deepcopy

# define the original dictionary
original_dict = {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}

# make a deep copy of the original dictionary
new_dict = deepcopy(original_dict)

# modify the dictionary in a loop
for key in new_dict:
    if isinstance(new_dict[key], dict) and 'e' in new_dict[key]:
        del new_dict[key]['e']

# print the original and modified dictionaries
print('Original dictionary:', original_dict)
print('Modified dictionary:', new_dict)

Which would yield:

Original dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5, 'e': 6}}
Modified dictionary: {'a': [1, 2, 3], 'b': {'c': 4, 'd': 5}}

Without new_dict = deepcopy(original_dict), 'e' element is unable to be removed.

Why? Because if the loop was for key in original_dict, and an attempt is made to modify original_dict, a RuntimeError would be observed:

"RuntimeError: dictionary changed size during iteration"

So in order to modify a dictionary within an iteration, a copy of the dictionary must be used.

Here is an example function that removes an element from a dictionary:

def remove_hostname(domain, hostname):
    domain_copy = deepcopy(domain)
    for domains, hosts in domain_copy.items():
        for host, port in hosts.items():
           if host == hostname:
                del domain[domains][host]
    return domain
xpros
  • 2,166
  • 18
  • 15
0

@Rob suggested a good alternative to copy.deepcopy() if you are planning to create standard APIs for your data (database or web request json payloads).

>>> import json
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> json.loads(json.dumps(my_dict))
{'a': [1, 2, 3], 'b': [4, 5, 6]}

This approach is also thread safe. But it only works for jsonifiable (serializable) objects like str, dict, list, int, float, and None. Sometimes that jsonifiable constraint can be a good thing because it forces you to make your data structures compliant with standard database fixture and web request formats (for creating best-practice APIs).

If your data structures don't work directly with json.dumps (for example datetime objects and your custom classes) they will be need to be coerced into a string or other standard type before serializing with json.dumps(). And you'll need to run a custom deserializer as well after json.loads():

>>> from datetime import datetime as dt
>>> my_dict = {'a': (1,), 'b': dt(2023, 4, 9).isoformat()}
>>> d = json.loads(json.dumps(my_dict))
>>> d
{'a': [1], 'b': '2023-04-09T00:00:00'}
>>> for k in d:
...     try:
...         d[k] = dt.fromisoformat(d[k])
...     except:
...         pass
>>> d
{'a': [1], 'b': datetime.datetime(2023, 4, 9, 0, 0)}

Of course you need to do the serialization and deserialization on special objects recursively. Sometimes that's a good thing. This process normalizes all your objects to types that are directly serializable (for example tuples become lists) and you can be sure they'll match a reproducable data schema (for relational database storage).

And it's thread safe. The builtin copy.deepcopy() is NOT thread safe! If you use deepcopy within async code that can crash your program or corrupt your data unexpectedly long after you've forgotten about your code.

hobs
  • 18,473
  • 10
  • 83
  • 106
  • Please explain why it's thread-safe. Becasuse [the doc](https://docs.python.org/3/library/json.html) doesn't say that those methods are thread-safe. – Olivier Grégoire Apr 10 '23 at 12:51
  • I think @Rob is right, those functions merely create two new objects without reusing any of the data from the original object. And the intermediate str object is deleted. Constructing new built-in types is always thread safe. But now that I say it out loud, perhaps you're concerned that the mutable objects might be modified during `json.dumps()`. I'd have to do some testing to see if dicts and lists are locked while you're iterating through them. I thought `iter`s were thread safe. – hobs Apr 13 '23 at 19:53
  • @OlivierGrégoire Python dicts and lists are thread-safe, even for write operations, but this code is only doing reads from dicts and lists, so that makes it even more obviously thread safe: https://stackoverflow.com/a/6955678/623735 – hobs Apr 20 '23 at 00:42
  • So if I follow your reasoning, `copy.deepcopy()` is also thread-safe. Then why is this answer more relevant? Or what makes `copy.deepcopy()` not thread-safe? – Olivier Grégoire Apr 20 '23 at 05:39
  • @OlivierGrégoire It's not better, just different. I like knowing how to do things many different ways so I can choose the right approach for my application. And I can remember and debug this one much more easily than deepcopy. Deepcopy is doing a lot of magic in C that I can't control on configure. Serialization (jsonifying) gives me more control and visibility into what is going on. – hobs Apr 20 '23 at 19:34
  • 1
    It's just that your first sentence in the answer seem to indicate that `copy.deepcopy()` isn't thread-safe "@Rob suggested a thread safe alternative to copy.deepcopy():" while your answer is thread-safe. Actually none are thread-safe. So I'd like that you remove your assertion that your answer is thread-safe. – Olivier Grégoire Apr 20 '23 at 20:09
  • Good point! I'll clarify the answer. – hobs Apr 22 '23 at 19:22