Why is dict definition faster in Python 2.7 than in Python 3.x?

Question

I have encountered a (not very unusual) situation in which I had to either use a map() or a list comprehension expression. And then I wondered which one is faster.

This StackOverflow answer provided me the solution, but then I started to test it myself. Basically the results were the same, but I found an unexpected behavior when switching to Python 3 that I got curious about, and namely:

λ iulian-pc ~ → python --version
Python 2.7.6
λ iulian-pc ~ → python3 --version
Python 3.4.3

λ iulian-pc ~ → python -mtimeit '{}'                                                     
10000000 loops, best of 3: 0.0306 usec per loop
λ iulian-pc ~ → python3 -mtimeit '{}'                
10000000 loops, best of 3: 0.105 usec per loop

λ iulian-pc ~ → python -mtimeit 'dict()'
10000000 loops, best of 3: 0.103 usec per loop
λ iulian-pc ~ → python3 -mtimeit 'dict()'
10000000 loops, best of 3: 0.165 usec per loop

I had the assumption that Python 3 is faster than Python 2, but it turned out in several posts (1, 2) that it's not the case. Then I thought that maybe Python 3.5 will perform better at such a simple task, as they state in their README:

The language is mostly the same, but many details, especially how built-in objects like dictionaries and strings work, have changed considerably, and a lot of deprecated features have finally been removed.

But nope, it performed even worse:

λ iulian-pc ~ → python3 --version
Python 3.5.0

λ iulian-pc ~ → python3 -mtimeit '{}'       
10000000 loops, best of 3: 0.144 usec per loop
λ iulian-pc ~ → python3 -mtimeit 'dict()'
1000000 loops, best of 3: 0.217 usec per loop

I've tried to dive into the Python 3.5 source code for dict, but my knowledge of C language is not sufficient to find the answer myself (or, maybe I even don't search in the right place).

So, my question is:

What makes the newer version of Python slower comparing to an older version of Python on a relatively simple task such as a dict definition, as by the common sense it should be vice-versa? I'm aware of the fact that these differences are so small that in most cases they can be neglected. It was just an observation that made me curious about why the time increased and not remained the same at least?

Note that `dict()` is a function call. This requires a lookup for the `dict` function and then calling it, which obviously has much more overhead than built-in syntax that requires no lookups. — Bakuriu, May 29 '16 at 07:10
@Bakuriu, I know, I just put it to show that both literal and constructor notation have an increase. — iulian, May 29 '16 at 08:27
Now go ahead an find a reason why `python -m timeit 1` varies between cpython versions. — Dima Tisnek, May 30 '16 at 08:14

score 31 · Answer 1 · answered May 28 '16 at 18:33

31

Because nobody cares

The differences you are citing are on the order of tens or hundreds of nanoseconds. A slight difference in how the C compiler optimizes register use could easily cause such changes (as could any number of other C-level optimization differences). That, in turn, could be caused by any number of things, such as changes in the number and usage of local variables in the C implementation of Python (CPython), or even just switching C compilers.

The fact is, nobody is actively optimizing for these small differences, so nobody is going to be able to give you a specific answer. CPython is not designed to be fast in an absolute sense. It is designed to be scalable. So, for example, you can shove hundreds or thousands of items into a dictionary and it will continue to perform well. But the absolute speed of creating a dictionary simply isn't a primary concern of the Python implementors, at least when the differences are this small.

answered May 28 '16 at 18:33

Kevin

28,963
9
62
81

Thank you for the answer, Kevin. I'm aware of the fact that these differences are tiny, but do you have any idea why the time increased in Python 3.x and not stayed the same? – iulian May 28 '16 at 18:53
14

@iulian: No one was *trying* to make it stay the same, so it's unsurprising that it went up. It would be equally unsurprising if it went down. Lots of stuff changed between 2.7 and 3.5. – Kevin May 28 '16 at 18:57
1

The difference in pretty substantial and almost certainly not due to the number of local variables or register allocation by the compiler. Dict is quite fundamental to the internals of python with a whole bunch of special case optimizations. The difference is probably because of a change in the implementation. – pvg May 28 '16 at 19:31
@pvg: I *mentioned* a change in the implementation: "That, in turn, could be caused by any number of things, such as changes in the number **and usage** of local variables in the C implementation of Python..." That is an implementation change. – Kevin May 28 '16 at 19:58
3

@Kevin I think that's technically correct (the best kind of correct!) but the gist of your answer I think misrepresents the scale of both the time difference and implementation change. This didn't happen because they used a couple of more local vars. – pvg May 28 '16 at 20:30
@pvg: "This didn't happen because they used a couple of more local vars." - Yes it did, see my comment on the accepted answer. The code is not materially different. – Kevin May 28 '16 at 20:38
@Kevin you're assuming _PyDict_NewPresized is the same in 2.7 and 3.5. They are not, nor are the calls they make. – pvg May 28 '16 at 20:56
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/113230/discussion-between-kevin-and-pvg). – Kevin May 28 '16 at 22:03

Moses Koledoye · Answer 2 · 2016-05-28T19:02:10.563

As @Kevin already stated:

CPython is not designed to be fast in an absolute sense. It is designed to be scalable

Try this instead:

$ python -mtimeit "dict([(2,3)]*10000000)"
10 loops, best of 3: 512 msec per loop
$
$ python3 -mtimeit "dict([(2,3)]*10000000)"
10 loops, best of 3: 502 msec per loop

And again:

$ python -mtimeit "dict([(2,3)]*100000000)"
10 loops, best of 3: 5.19 sec per loop
$
$ python3 -mtimeit "dict([(2,3)]*100000000)"
10 loops, best of 3: 5.07 sec per loop

That pretty shows that you can't benchmark Python3 as losing against Python2 on such an insignificant difference. From the look of things, Python3 should scale better.

score 19 · Accepted Answer · edited Jun 20 '20 at 09:12

Let's disassemble {}:

>>> from dis import dis
>>> dis(lambda: {})
  1           0 BUILD_MAP                0
              3 RETURN_VALUE

Python 2.7 implementation of BUILD_MAP

TARGET(BUILD_MAP)
{
    x = _PyDict_NewPresized((Py_ssize_t)oparg);
    PUSH(x);
    if (x != NULL) DISPATCH();
    break;
}

Python 3.5 implementation of BUILD_MAP

TARGET(BUILD_MAP) {
    int i;
    PyObject *map = _PyDict_NewPresized((Py_ssize_t)oparg);
    if (map == NULL)
        goto error;
    for (i = oparg; i > 0; i--) {
        int err;
        PyObject *key = PEEK(2*i);
        PyObject *value = PEEK(2*i - 1);
        err = PyDict_SetItem(map, key, value);
        if (err != 0) {
            Py_DECREF(map);
            goto error;
        }
    }

    while (oparg--) {
        Py_DECREF(POP());
        Py_DECREF(POP());
    }
    PUSH(map);
    DISPATCH();
}

It's little bit more code.

EDIT:

Python 3.4 implementation of BUILD_MAP id exactly the same as 2.7 (thanks @user2357112). I dig deeper and it's looks like Python 3 min size of dict is 8 PyDict_MINSIZE_COMBINED const

PyDict_MINSIZE_COMBINED is the starting size for any new, non-split dict. 8 allows dicts with no more than 5 active entries; experiments suggested this suffices for the majority of dicts (consisting mostly of usually-small dicts created to pass keyword arguments). Making this 8, rather than 4 reduces the number of resizes for most dictionaries, without any significant extra memory use.

Look at _PyDict_NewPresized in Python 3.4

PyObject *
_PyDict_NewPresized(Py_ssize_t minused)
{
    Py_ssize_t newsize;
    PyDictKeysObject *new_keys;
    for (newsize = PyDict_MINSIZE_COMBINED;
         newsize <= minused && newsize > 0;
         newsize <<= 1)
        ;
    new_keys = new_keys_object(newsize);
    if (new_keys == NULL)
        return NULL;
    return new_dict(new_keys, NULL);
}

and in 2.7

PyObject *
_PyDict_NewPresized(Py_ssize_t minused)
{
    PyObject *op = PyDict_New();

    if (minused>5 && op != NULL && dictresize((PyDictObject *)op, minused) == -1) {
        Py_DECREF(op);
        return NULL;
    }
    return op;
}

In both cases minused has value 1.

Python 2.7 create a empty dict and Python 3.4 create a 7-element dict.

Neither loop actually runs because there are no arguments on the stack. The 3.x code is *not materially different* from the 2.x code aside from a few extra not-taken branches and a `PUSH(map); DISPATCH();`. — Kevin, May 28 '16 at 20:37
@Kevin Yes, but the 3.x code have the branches and must analyze when do it and when not. — Tomasz Jakub Rup, May 28 '16 at 21:01
Two extra highly-predictable branches don't seem like they'd cause the observed effect. For the `{}` case, it takes 3 times as long! Two more branches out of all the branches and allocations already involved aren't going to do that. For the `dict()` case, none of this code even runs, but the absolute time increase is about the same as the `{}` case. To find the actual root of the slowdown, you need to dig deeper. — user2357112, May 29 '16 at 01:22
My hypothesis would be that the extra allocations involved in the [split-table dict implementation](https://www.python.org/dev/peps/pep-0412/) have a lot to do with the difference - for example, there's now a `PyDictKeysObject` that gets allocated separately through `PyMem_MALLOC` instead of through the free list - but to really confirm this, we'd have to compare the performance on 3.2 (the last version before the split-table implementation) and possibly compile some variant builds of 3.5 where we change or eliminate parts and see how they affect the timing. — user2357112, May 29 '16 at 01:31
Also, the differences you're pointing out [don't even exist on 3.4](https://hg.python.org/cpython/file/3.4/Python/ceval.c#l2377); they were a 3.5 change. — user2357112, May 29 '16 at 01:43

Why is dict definition faster in Python 2.7 than in Python 3.x?

So, my question is:

3 Answers3

EDIT: