28

CPython deque is implemented as a doubly-linked list of 64-item sized "blocks" (arrays). The blocks are all full, except for the ones at either end of the linked list. IIUC, the blocks are freed when a pop / popleft removes the last item in the block; they are allocated when append/appendleft attempts to add a new item and the relevant block is full.

I understand the listed advantages of using a linked list of blocks rather than a linked list of items:

  • reduce memory cost of pointers to prev and next in every item
  • reduce runtime cost of doing malloc/free for every item added/removed
  • improve cache locality by placing consecutive pointers next to each other

But why wasn't a single dynamically-sized circular array used instead of the doubly-linked list in the first place?

AFAICT, the circular array would preserve all the above advantages, and maintain the (amortized) cost of pop*/append* at O(1) (by overallocating, just like in list). In addition, it would improve the cost of lookup by index from the current O(n) to O(1). A circular array would also be simpler to implement, since it can reuse much of the list implementation.

I can see an argument in favor of a linked list in languages like C++, where removal of an item from the middle can be done in O(1) using a pointer or iterator; however, python deque has no API to do this.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
max
  • 49,282
  • 56
  • 208
  • 355
  • There are no compelling advantages to the linked list. Unless someone can dig up a relevant mailing list discussion or something, all we can do is chalk it up to some guy's whim at the time `collections.deque` was introduced. – user2357112 Jul 16 '17 at 23:25
  • Depending on the outcome of this question, maybe implement your idea, run it against the CPython tests, make some benchmarks, and submit a pull request. – Alex Hall Jul 16 '17 at 23:28
  • 2
    @user2357112 I would agree with you, except that the guy in question is super-skilled and never does anything without a careful analysis :) – max Jul 16 '17 at 23:28
  • 1
    Digging up [Raymond Hettinger's `collections` module proposal to Python-Dev](https://mail.python.org/pipermail/python-dev/2004-January/041829.html), probably one of the first public discussions about `collections.deque`, there's no rationale given for the linked-list structure, and I haven't seen any replies discussing it. The closest thing to a rationale is "like we used for itertools.tee()", suggesting maybe they wanted to reuse existing code (with modifications for the double links). – user2357112 Jul 16 '17 at 23:41
  • @user2357112 interesting... still, thinking through what I may be overlooking -- is there something attractive about memory allocations for a linked list of blocks vs a circular array? I found [this](https://stackoverflow.com/a/39324326/336527) argument in favor of the more frequent but small memory allocations of a linked list of blocks vs. less frequent but larger allocations of a circular array; but I'm not convinced (in fact, I would argue the opposite). – max Jul 16 '17 at 23:54
  • (To readers from the future: if mailing list links in this discussion break, as they often seem to do, try searching the [archives for that month of discussions](https://mail.python.org/pipermail/python-dev/2004-January/thread.html). – user2357112 Jul 16 '17 at 23:55
  • Actually the C++ `std::deque` (in all implementations I know of) uses an array of pointers to blocks, not a linked list. So insertion/deletion from the middle is still linear time. What the standard requires from `std::deque` is that append/prepend do not cause objects in the deque to move (i.e. invalidate pointers into the deque). Also the standard requires access by index to be O(1). Only the latter would apply to Python, of course. – Nemo Jul 17 '17 at 00:03
  • @Nemo Agreed... I linked that post because I thought (perhaps mistakenly) it argued that, ignoring any semantics of invalidation, there's some advantage to many smaller mallocs rather than a few large mallocs. – max Jul 17 '17 at 00:08
  • 3
    [This](https://www.mail-archive.com/python-dev@python.org/msg25024.html) message from RDH seems like the answer to your question, not as verbose as you'd probably want. The [previous message](https://www.mail-archive.com/python-dev@python.org/msg25022.html) discusses the alternative implementation. – Dimitris Fasarakis Hilliard Jul 17 '17 at 00:33
  • @JimFasarakisHilliard Ah perfect! Do you mind making that an answer, so I can accept it for the benefit of everyone else reading this? What I missed is (1) the current approach does not do frequent allocations since it keeps resuable blocks on an internal free list, defeating one the advantages I saw in a circular array; (2) the current approach completely avoids the data movements (while pointer invalidation isn't an issue in python, the extra overhead of copying still is); (3) the access of items in the middle isn't of much benefit for the intended use case. – max Jul 17 '17 at 00:44
  • @max I'd suggest you post a self-answer, including your observations and any replies you might get from the Python-dev posting you did a couple of days back :-) – Dimitris Fasarakis Hilliard Jul 17 '17 at 00:54
  • I can do that, but even better if @TimPeters were to copy the answer he just wrote in response to my post [here](https://mail.python.org/pipermail/python-dev/2017-July/148589.html) :) – max Jul 17 '17 at 02:39
  • I'm afraid pinging in comments w/o the user actually participating doesn't do the trick (at least I remember seeing that somewhere on Meta). – Dimitris Fasarakis Hilliard Jul 17 '17 at 02:49

2 Answers2

28

Adapted from my reply on the python-dev mailing list:

The primary point of a deque is to make popping and pushing at both ends efficient. That's what the current implementation does: worst-case constant time per push or pop regardless of how many items are in the deque. That beats "amortized O(1)" in the small and in the large. That's why it was done this way.

Some other deque methods are consequently slower than they are for lists, but who cares? For example, the only indices I've ever used with a deque are 0 and -1 (to peek at one end or the other of a deque), and the implementation makes accessing those specific indices constant-time too.

Indeed, the message from Raymond Hettinger referenced by Jim Fasarakis Hilliard in his comment:

https://www.mail-archive.com/python-dev@python.org/msg25024.html

confirms that

The only reason that __getitem__ was put in was to support fast access to the head and tail without actually popping the value

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
5

In addition to accepting @TimPeters answer, I wanted to add a couple additional observations that don't fit into a comment format.

Let's just focus on a common use case where deque is used as a simple a FIFO queue.

Once the queue reaches its peak size, the circular array need no more allocations of memory from the heap. I thought of it as an advantage, but it turns out the CPython implementation achieved the same by keeping a list of reusable memory blocks. A tie.

While the queue size is growing, both the circular array and the CPython need memory from the heap. CPython needs a simple malloc, while the array needs the (potentially much more expensive) realloc (unless extra space happens to be available on the right edge of the original memory block, it needs to free the old memory and copy the data over). Advantage to CPython.

If the queue peaked out at a much larger size than its stable size, both CPython and the array implementation would waste the unused memory (the former by saving it in a reusable block list, the latter by leaving the unused empty space in the array). A tie.

As @TimPeters pointed out, the cost of each FIFO queue put / get is always O(1) for CPython, but only amortized O(1) for the array. Advantage to CPython.

max
  • 49,282
  • 56
  • 208
  • 355
  • 2
    Just noting that in the last case (deque peaked at a much larger size than its stable size), the space "wasted" by the deque implementation is bounded by a small constant: the internal block free list contains at most `MAXFREEBLOCKS` deque blocks, which is 16. If that many blocks are already on the list, subsequently freed blocks are returned to "the system' via `PyMem_Free()`. The block free list is really aimed at efficiency in the common FIFO cases where objects are pushed and popped at about the same rate. – Tim Peters Jul 21 '17 at 22:19
  • @TimPeters Ah nice, another advantage to CPython implementation. A great example of the benefit of focusing on one important use case. – max Jul 22 '17 at 00:53
  • @max Thanks for this great question and answer. But I am still very confused. The discussions here argue that a doubly-linked list implementation of deque is always O(1), while circular array is amortized O(1), which is a benefit of linked list implementation. But isn't pointer referencing much slower than array indexing? Even though array size is resized, the double array space is allocated all at once. But in the case of linked list, a new memory unit is allocated (via malloc) every time a new item is inserted (isn't this slower?). – Peng Nov 21 '20 at 22:26
  • Also, look at Java's implementations of the Deque interface. Java offers both approaches: i.e., LinkedList, a doubly-linked list implementation of Deque, and ArrayDeque, a circular array implementation of Deque. It is explicitly stated in Java documentation that "This class (ArrayDeque) is likely to be faster than Stack when used as a stack, and faster than LinkedList when used as a queue." See: https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/util/ArrayDeque.html – Peng Nov 21 '20 at 22:30