How come unpacking is faster than accessing by index?

Question

I'm referring to this question, and especially the comments to the first answer from @David Robinson and @mgilson: Sum the second value of each tuple in a list

The original question was to sum the second value of each tuble:

structure = [('a', 1), ('b', 3), ('c', 2)]

First Answer:

sum(n for _, n in structure)

Second Answer:

sum(x[1] for x in structure)

According to discussion, the first answer is 50% faster.

Once I figured out what the first answer does (coming from Perl, I Googled for the special _ variable means in python), I got wondering how come what appears as a pure subset task (getting only the second element of each tuple vs. getting and binding into variables both elements) is actually slower? Is it a missing opportunity to optimize index access in Python? Am I missing something the second answer does which takes time?

Martijn Pieters · Accepted Answer · 2012-10-23T12:49:45.943

If you take a look at the python bytecode, it becomes quite obvious very quickly why unpacking is faster:

>>> import dis
>>> def unpack_or_index(t=(0, 1)):
...     _, x = t
...     x = t[1]
... 
>>> dis.dis(unpack_or_index)
  2           0 LOAD_FAST                0 (t)
              3 UNPACK_SEQUENCE          2
              6 STORE_FAST               1 (_)
              9 STORE_FAST               2 (x)

  3          12 LOAD_FAST                0 (t)
             15 LOAD_CONST               1 (1)
             18 BINARY_SUBSCR       
             19 STORE_FAST               2 (x)
             22 LOAD_CONST               0 (None)
             25 RETURN_VALUE

The tuple unpacking operation is a simple bytecode (UNPACK_SEQUENCE), while the indexing operation has to call a method on the tuple (BINARY_SUBSCR). The unpack operation can take place, inline, in the python evaluation loop, while the subscription call requires looking up of the function on the tuple object to retrieve the value, using PyObject_GetItem.

The UNPACK_SEQUENCE opcode source code special-cases a python tuple or list unpack where the the sequence length matches the argument length exactly:

        if (PyTuple_CheckExact(v) &&
            PyTuple_GET_SIZE(v) == oparg) {
            PyObject **items = \
                ((PyTupleObject *)v)->ob_item;
            while (oparg--) {
                w = items[oparg];
                Py_INCREF(w);
                PUSH(w);
            }
            Py_DECREF(v);
            continue;
        } // followed by an "else if" statement for a list with similar code

The above code reaches into the native structure of the tuple and retrieves the values directly; no need to use heavy calls such as PyObject_GetItem which have to take into account that the object could be a custom python class.

The BINARY_SUBSCR opcode is only optimized for python lists; anything that isn't a native python list requires a PyObject_GetItem call.

Nevertheless, I’d expect `BINARY_SUBSCRIPT` to be quite an efficient operation (after lookup) whereas `UNPACK_SEQUENCE` isn’t (O(1) vs. O(n), as it were) and the subscript operation requires one less store operation. In fact, your formatting of the code is misleading – both operations take the same number of op codes and you haven’t actually explained why the unpacking is faster than subscripting: in particular, why does the unpacking not also require a method lookup via `PyObject_GetItem`? — Konrad Rudolph, Oct 23 '12 at 11:50
@KonradRudolph: Expanded. Interestingly enough, if you replace the tuple with a list, the tables may well be turned.. — Martijn Pieters, Oct 23 '12 at 12:54

Amber · Answer 2 · 2012-10-23T16:21:27.083

18

Indexing goes through the __getitem__ special method, which thus has to do function lookup and execution for each item. That means that for a list of n items, you wind up doing n lookups/calls.

Unpacking doesn't have to deal with that when working with native lists/tuples; it just goes through __iter__ which is a single call, and then unpacks the resulting sequence in C.

edited Oct 23 '12 at 16:21

answered Oct 23 '12 at 06:11

Amber

507,862
82
626
550

To be clear: are you saying that unpacking doesn’t go via a special method, and hence doesn’t do method lookup? So unpacking only works on native Python objects (`list`, `tuple`) then? – Konrad Rudolph Oct 23 '12 at 11:54
@KonradRudolph No - note the mention of `__iter__`. What I'm saying is that it only has to do that call *once*, and thus such an operation is O(1) compared to the O(n) of calling `__getitem__` for *each item* in the sequence. – Amber Oct 23 '12 at 16:06
@Amber `__getitem__` has to be called exactly as many times as the unpacking method: once for every item in `structure`. And `__iter__` is used in *both* alternatives. – Konrad Rudolph Oct 23 '12 at 16:12
1

@KonradRudolph: Unpacking works for arbitrary sequences, but the tuple/list case has been special-cased and optimized. The underlying C array is used directly in that case. – Martijn Pieters Oct 23 '12 at 17:03

How come unpacking is faster than accessing by index?

2 Answers2

Linked