3

What is the time complexity of iterating, or more precisely each iteration through a deque from the collections library in Python?

An example is this:

elements = deque([1,2,3,4])
for element in elements:
  print(element)

Is each iteration a constant O(1) operation? or does it do a linear O(n) operation to get to the element in each iteration?

There are many resources online for time complexity with all of the other deque methods like appendleft, append, popleft, pop. There doesn't seem to be any time complexity information about the iteration of a deque.

Thanks!

perseverance
  • 6,372
  • 12
  • 49
  • 68
  • 1
    deque = double ended queue. So finding the nth element is `O(n)` – Jean-François Fabre Nov 16 '17 at 20:32
  • 3
    Yes. But you can get linear iteration if you do `for element in elements:` – juanpa.arrivillaga Nov 16 '17 at 20:32
  • 3
    In other words, *iteration over a deque* is linear, but repeatedly indexing into a deque is quadratic. – juanpa.arrivillaga Nov 16 '17 at 20:33
  • @juanpa.arrivillaga now I fully understand why `for i in range(len(elements))` is terribly unpythonic. oh and `len(elements)` is probably `O(n)` unless some caching is implemented. – Jean-François Fabre Nov 16 '17 at 20:34
  • @Jean-FrançoisFabre So just to clarify, at each iteration, it is doing a O(n) operation to get to element[i]? Even though it is essentially a doubly linked list and each movement to the left or right could be O(1)? – perseverance Nov 16 '17 at 20:35
  • 2
    @perseverance yes, if *you do the indexing* that's what has to happen. Fortunately, if you want to iterate over the deque, *you don't have to do that*. You almost **never** should be doing `for i in range(len(elements))`. It is merely ugly and error prone with `list` objects that have random-access, but with `deque` objects you'll feel the pain of an O(N) indexing operation. – juanpa.arrivillaga Nov 16 '17 at 20:36
  • @juanpa.arrivillaga That is what I am looking for. So at each iteration it is in fact a O(1) operation, and as a whole to get to the end of the iteration, it is a O(n) operation. – perseverance Nov 16 '17 at 20:36
  • each iteration (properly done) just takes the next element of the deque, so yes O(1). – Jean-François Fabre Nov 16 '17 at 20:37
  • 1
    @perseverance: It'd be O(1) per iteration if you were doing it right, but you're doing it wrong. – user2357112 Nov 16 '17 at 20:37
  • @perseverance No. using `for i in range(len(my_deque)): print(my_deque[i])` will be quadratic. Using `for x in my_deque: print(x)` will be linear – juanpa.arrivillaga Nov 16 '17 at 20:37
  • @user2357112 How do I do it right? Please clarify! Thanks! – perseverance Nov 16 '17 at 20:37
  • 1
    Iterate over the deque directly, not over the indices, as juanpa.arrivillaga already said. – user2357112 Nov 16 '17 at 20:38
  • Thanks guys! @juanpa.arrivillaga can you please add it as an answer? – perseverance Nov 16 '17 at 20:39
  • As the comments have said, the right way to do it is always to iterate through the thing itself: `for elem in elements`. – Daniel Roseman Nov 16 '17 at 20:39
  • @Jean-FrançoisFabre I don't believe this a duplicate question because its not about random access in a deque. Its about iterating over a deque. – perseverance Nov 16 '17 at 20:41
  • I'm hesitating. It's about wrongly iterating over a deque. I've voted to close, and I won't reopen, but other commenters have the powers to do so, and I don't mind them doing so if they feel it's a different question (it's always nicer when the reopener has the blessing of the closer, not that it's strictly needed but...) – Jean-François Fabre Nov 16 '17 at 20:42
  • also note (in the linked question) that deque is smarter than a simple double ended list. Is uses blocks, so it's better than O(n). – Jean-François Fabre Nov 16 '17 at 20:43
  • 1
    @Jean-FrançoisFabre: It's still O(n). It's just got a better constant factor than if it were a textbook doubly-linked list instead of an unrolled list. – user2357112 Nov 16 '17 at 20:45
  • @user2357112 thanks for the clarification. My knowledge of complexity is also textbook :) – Jean-François Fabre Nov 16 '17 at 20:45
  • I re-opened this, since I do think a question about the complexity of iteration isn't actually answered in the duplicate (and isn't really documented anywhere as far as I can find), so I resorted to empirical testing and checking the CPython source code. My C is very middling, so anyone feel free to correct my interpretation of the source code. – juanpa.arrivillaga Nov 16 '17 at 22:32
  • Did nobody link to the documentation yet? https://docs.python.org/2/library/collections.html#collections.deque "Indexed access is O(1) at both ends but slows to O(n) in the middle" – en_Knight Nov 16 '17 at 23:17
  • @juanpa.arrivillaga, a possible exception to "almost never": How to iterate over the first 10,000 elements of a 100,000-element deque? I see two bad options: Iterating over a range of indexes (slow) or iterating over the deque while maintaining a counter to stop (ugly code). Given the plot below, I would go with the first option. – Rainald62 Jun 25 '18 at 11:48
  • 1
    @Rainald62 use `itertools.islice` – juanpa.arrivillaga Jun 25 '18 at 15:19

1 Answers1

15

If your construction is something like:

elements = deque([1,2,3,4])
for i in range(len(elements)):
    print(elements[i])

You are not iterating over the deque, you are iterating over the range object, and then indexing into the deque. This makes the iteration polynomial time, since each indexing operation, elements[i] is O(n). However, actually iterating over the deque is linear time.

for x in elements:
    print(x)

Here's a quick, empirical test:

import timeit
import pandas as pd
from collections import deque

def build_deque(n):
    return deque(range(n))

def iter_index(d):
    for i in range(len(d)):
        d[i]

def iter_it(d):
    for x in d:
        x

r = range(100, 10001, 100)

index_runs = [timeit.timeit('iter_index(d)', 'from __main__ import build_deque, iter_index, iter_it; d = build_deque({})'.format(n), number=1000) for n in r]
it_runs = [timeit.timeit('iter_it(d)', 'from __main__ import build_deque, iter_index, iter_it; d = build_deque({})'.format(n), number=1000) for n in r]

df = pd.DataFrame({'index':index_runs, 'iter':it_runs}, index=r)
df.plot()

And the resulting plot: enter image description here

Now, we can actually see how the iterator protocol is implemented for deque objects in CPython source code:

First, the deque object itself:

typedef struct BLOCK {
    struct BLOCK *leftlink;
    PyObject *data[BLOCKLEN];
    struct BLOCK *rightlink;
} block;

typedef struct {
    PyObject_VAR_HEAD
    block *leftblock;
    block *rightblock;
    Py_ssize_t leftindex;       /* 0 <= leftindex < BLOCKLEN */
    Py_ssize_t rightindex;      /* 0 <= rightindex < BLOCKLEN */
    size_t state;               /* incremented whenever the indices move */
    Py_ssize_t maxlen;
    PyObject *weakreflist;
} dequeobject;

So, as stated in the comments, a deque is a doubly-linked list of "block" nodes, where a block is essentially an array of python object pointers. Now for the iterator protocol:

typedef struct {
    PyObject_HEAD
    block *b;
    Py_ssize_t index;
    dequeobject *deque;
    size_t state;          /* state when the iterator is created */
    Py_ssize_t counter;    /* number of items remaining for iteration */
} dequeiterobject;

static PyTypeObject dequeiter_type;

static PyObject *
deque_iter(dequeobject *deque)
{
    dequeiterobject *it;

    it = PyObject_GC_New(dequeiterobject, &dequeiter_type);
    if (it == NULL)
        return NULL;
    it->b = deque->leftblock;
    it->index = deque->leftindex;
    Py_INCREF(deque);
    it->deque = deque;
    it->state = deque->state;
    it->counter = Py_SIZE(deque);
    PyObject_GC_Track(it);
    return (PyObject *)it;
}

// ...

static PyObject *
dequeiter_next(dequeiterobject *it)
{
    PyObject *item;

    if (it->deque->state != it->state) {
        it->counter = 0;
        PyErr_SetString(PyExc_RuntimeError,
                        "deque mutated during iteration");
        return NULL;
    }
    if (it->counter == 0)
        return NULL;
    assert (!(it->b == it->deque->rightblock &&
              it->index > it->deque->rightindex));

    item = it->b->data[it->index];
    it->index++;
    it->counter--;
    if (it->index == BLOCKLEN && it->counter > 0) {
        CHECK_NOT_END(it->b->rightlink);
        it->b = it->b->rightlink;
        it->index = 0;
    }
    Py_INCREF(item);
    return item;
}

As you can see, the iterator essentially keeps track of a block index, a pointer to a block, and a counter of total items in the deque. It stops iterating if the counter reaches zero, if not, it grabs the element at the current index, increments the index, decrements the counter, and tales care of checking whether to move to the next block or not. In other words, A deque could be represented as a list-of-lists in Python, e.g. d = [[1,2,3],[4,5,6]], and it iterates

for block in d:
    for x in block:
        ...
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • Great answer all around. I expect the graph to be monotonic though - why are their big oscillations in the blue curve? – en_Knight Nov 16 '17 at 23:14
  • @en_Knight I could only speculate as to the source, but it's probably mostly noise. – juanpa.arrivillaga Nov 16 '17 at 23:17
  • Probably :) Also, I really like citing the implementation, though it may be worth nothing that this behaviour is not implementation specific, it's also cited in the spec – en_Knight Nov 16 '17 at 23:20
  • @en_Knight I wasn't able to find anything regarding iteration specifically, but if you drop a link I'll add it to the answer. – juanpa.arrivillaga Nov 16 '17 at 23:27