11

Suppose we have two lists A = [a1, a2, ..., an](n elements), and B = [b1, b2, ..., bm](m elements), and we use "+" in Python to merge two lists into one, so

C = A + B;

My question is what the runtime of this operation is? My first guess is O(n+m), not sure if Python is smarter than that.

nbro
  • 15,395
  • 32
  • 113
  • 196
Toby
  • 387
  • 1
  • 4
  • 8
  • 1
    Addition of two lists *will* be O(n+m) because each Python list is implemented as a fixed-size C array. When you add two lists, you are allocating memory for a new array and copying each element of each member list into the array. Using append() and extend() will improve performance to O(m), but if you *must* create a third list and preserve the original two, there's no obvious way to improve on O(n+m). – dylrei Mar 22 '15 at 18:06

3 Answers3

16

When you concatenate the two lists with A + B, you create a completely new list in memory. This means your guess is correct: the complexity is O(n + m) (where n and m are the lengths of the lists) since Python has to walk both lists in turn to build the new list.

You can see this happening in the list_concat function in the source code for Python lists:

static PyObject *
list_concat(PyListObject *a, PyObject *bb)
{
/* ...code snipped... */
    src = a->ob_item;
    dest = np->ob_item;
    for (i = 0; i < Py_SIZE(a); i++) {     /* walking list a */
        PyObject *v = src[i];
        Py_INCREF(v);
        dest[i] = v;
    }
    src = b->ob_item;
    dest = np->ob_item + Py_SIZE(a);
    for (i = 0; i < Py_SIZE(b); i++) {     /* walking list b */
        PyObject *v = src[i];
        Py_INCREF(v);
        dest[i] = v;
    }
/* ...code snipped... */

If you don't need a new list in memory, it's often a good idea to take advantage of the fact that lists are mutable (and this is where Python is smart). Using A.extend(B) is O(m) in complexity meaning that you avoid the overhead of copying list a.

The complexity of various list operations are listed here on the Python wiki.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
2

My first guess is O(n+m), not sure if Python is smarter than that.

Nothing can be smarter than that while returning a copy. Though even if A, B were immutable sequences such as strings; CPython still makes a full copy instead of aliasing the same memory (it simplifies implementation of the garbage collection for such strings).

In some specific cases, the operation could be O(1) depending on what you want to do with the result e.g., itertools.chain(A, B) allows to iterate over all items (it does not make a copy, the change in A, B affects yielded items). Or if you need a random access; you could emulate it using a Sequence subclass e.g., WeightedPopulation but in the general case the copy and therefore O(n+m) runtime is unavoidable.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

Copying a list is O(n) (with n being the number of elements) and extending is O(k) (with k being the number of elements in the second list). Based on these two facts, I would think it couldn't be any less than O(n+k), since this is a copy and extend operation, and the very least you would need to copy all the elements of both lists.

Source: Python TimeComplexity

ARAT
  • 884
  • 1
  • 14
  • 35
TheBlackCat
  • 9,791
  • 3
  • 24
  • 31