93

I have a Python program that works with dictionaries a lot. I have to make copies of dictionaries thousands of times. I need a copy of both the keys and the associated contents. The copy will be edited and must not be linked to the original (e.g. changes in the copy must not affect the original.)

Keys are Strings, Values are Integers (0/1).

I currently use a simple way:

newDict = oldDict.copy()

Profiling my Code shows that the copy operation takes most of the time.

Are there faster alternatives to the dict.copy() method? What would be fastest?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Joern
  • 933
  • 1
  • 6
  • 4
  • 1
    If the value can be either 0 or 1, would a `bool` be a better choice than an `int`? – Samir Talwar May 02 '11 at 19:31
  • 5
    And if you need thousands of copies of them, would bitmasks work even better? – Wooble May 02 '11 at 19:38
  • @Samir isn't `bool` in Python named `int` anyway. – Santa May 02 '11 at 19:42
  • I agree, though, that a bitmask might be more efficient for you (depending on how you use this "dict", really). – Santa May 02 '11 at 19:42
  • @Santa: Not as far as I know. Entirely separate types. – Samir Talwar May 02 '11 at 19:44
  • @Samir @Santa @Wooble Thank you for the suggestions! I guess I will rewrite the code and try to avoid the use of dicts entirely (or at least at most places...) Will definitely try to replace ints by bools and look at bitmasks. – Joern May 02 '11 at 19:48
  • @Samir From Python's POV, sure. But AFAIK, under the hood, True == (int) 1 and False == (int) 0 in CPython. In other words, it's still a 32-bit value. – Santa May 02 '11 at 19:54
  • 1
    To clarify, the `bool` type is actually a subclass (subtype?) of the `int` type. – Santa May 02 '11 at 20:04
  • @Santa: Yeah, probably. I was more concerned about semantics and intention though. If you're conveying truthness, use a boolean. – Samir Talwar May 02 '11 at 21:12

7 Answers7

67

Looking at the C source for the Python dict operations, you can see that they do a pretty naive (but efficient) copy. It essentially boils down to a call to PyDict_Merge:

PyDict_Merge(PyObject *a, PyObject *b, int override)

This does the quick checks for things like if they're the same object and if they've got objects in them. After that it does a generous one-time resize/alloc to the target dict and then copies the elements one by one. I don't see you getting much faster than the built-in copy().

tshepang
  • 12,111
  • 21
  • 91
  • 136
Daniel DiPaolo
  • 55,313
  • 14
  • 116
  • 115
  • 1
    Sounds like I better rewrite the code to avoid the use of dicts at all - or use a faster data structure that can do the same job. Thank you very much for the answer! – Joern May 02 '11 at 19:49
59

Appearantly dict.copy is faster, as you say.

[utdmr@utdmr-arch ~]$ python -m timeit -s "d={1:1, 2:2, 3:3}" "new = d.copy()"
1000000 loops, best of 3: 0.238 usec per loop
[utdmr@utdmr-arch ~]$ python -m timeit -s "d={1:1, 2:2, 3:3}" "new = dict(d)"
1000000 loops, best of 3: 0.621 usec per loop
[utdmr@utdmr-arch ~]$ python -m timeit -s "from copy import copy; d={1:1, 2:2, 3:3}" "new = copy(d)"
1000000 loops, best of 3: 1.58 usec per loop
utdemir
  • 26,532
  • 10
  • 62
  • 81
  • Thanks for the comparison! Will try to rewrite the code as to avoid the use of dict copying in most places. Thanks again! – Joern May 02 '11 at 19:51
  • 4
    The way to do the last comparison without counting the cost of doing the import every time is with `timeit`'s `-s` argument: `python -m timeit -s "from copy import copy" "new = copy({1:1, 2:2, 3:3})"`. While you're at it, pull out the dict creation as well (for all examples.) – Thomas Wouters May 02 '11 at 19:56
  • Maybe repeat the processes many times is better since there might be some fluctuations of one specific shot. – xiaohan2012 Feb 17 '14 at 18:27
  • 2
    Timeit does that; as it says it loops 1000000 times and averages it. – utdemir Feb 18 '14 at 12:01
  • I have conflicting timings. a = {b: b for b in range(10000)} In [5]: %timeit copy(a) 10000 loops, best of 3: 186 µs per loop In [6]: %timeit deepcopy(a) 100 loops, best of 3: 14.1 ms per loop In [7]: %timeit a.copy() 1000 loops, best of 3: 180 µs per loop – Davoud Taghawi-Nejad Sep 20 '15 at 01:47
  • Well, my answer is older than four years, it's very probable that the implementation and the behavior of the CPython changed much. If you get timings for every one of them, I'll be happy to update the answer. – utdemir Sep 20 '15 at 10:37
  • Python 2.7.10 on a 2014 Macbook Pro: 0.130 / 0.289 / 0.793. Python 3.5.0 : 0.239 / 0.315 / 0.777. – knite Oct 30 '15 at 02:47
12

I realise this is an old thread, but this is a high result in search engines for "dict copy python", and the top result for "dict copy performance", and I believe this is relevant.

From Python 3.7, newDict = oldDict.copy() is up to 5.5x faster than it was previously. Notably, right now, newDict = dict(oldDict) does not seem to have this performance increase.

There is a little more information here.

iandioch
  • 121
  • 1
  • 6
12

Can you provide a code sample so I can see how you are using copy() and in what context?

You could use

new = dict(old)

But I dont think it will be faster.

MikeVaughan
  • 1,441
  • 1
  • 11
  • 18
4

Depending on things you leave to speculation, you may want to wrap the original dictionary and do a sort of copy-on-write.

The "copy" is then a dictionary which looks up stuff in the "parent" dictionary, if it doesn't already contain the key --- but stuffs modifications in itself.

This assumes that you won't be modifying the original and that the extra lookups don't end up costing more.

Alex Brasetvik
  • 11,218
  • 2
  • 35
  • 36
3

The measurments are dependent on the dictionary size though. For 10000 entries copy(d) and d.copy() are almost the same.

a = {b: b for b in range(10000)} 
In [5]: %timeit copy(a)
10000 loops, best of 3: 186 µs per loop
In [6]: %timeit deepcopy(a)
100 loops, best of 3: 14.1 ms per loop
In [7]: %timeit a.copy()
1000 loops, best of 3: 180 µs per loop
Davoud Taghawi-Nejad
  • 16,142
  • 12
  • 62
  • 82
0

I just ran into this problem. I managed to switch to lists by separating the keys and the values, and it helped a lot.

Let's say I have:

d = {
    'a': 5
    'b': 6
}

And, I want to make lots of copies with different values but the same keys.

I can create a lookup table:

lookup_table = {
    'a': 0
    'b': 1
}

And, then use a list:

counts = [5, 6]

To get a value, I do:

index = lookup_table['a']
print(counts[index])

To do copy on write, I do something like:

index = lookup_table['a']
new_counts = counts.copy()
new_counts[index] -= 1

I tried going even further by using Python's native array module instead of lists, but it actually made it slightly slower for me. It may be appropriate if you're storing HUGE amounts of data, though.

Shannon -jj Behrens
  • 4,910
  • 2
  • 19
  • 24