2

Is there a way to merge two Python deques in O(1)?

Double linked lists can be merged in O(1), and deque is the implementation of a double linked list. However, from the documentation I do not see a way to do efficiently merge two deques. The a.extend(b) and a += b mentioned in this question actually iterate over all the elements of b, so the time complexity is O(len(b)) and not O(1).

Federico
  • 1,925
  • 14
  • 19
  • 1
    Relevant (but not a duplicate AFAICT): https://stackoverflow.com/questions/6256983/how-are-deques-in-python-implemented-and-when-are-they-worse-than-lists – Quuxplusone Mar 05 '20 at 13:59
  • I would suggest implementing a deque-like that is actually a list of deques. – Dan D. Mar 05 '20 at 14:09
  • @DanD. Or rather, a deque of deques. – blhsing Mar 05 '20 at 14:15
  • @Quuxplusone I think you're confused with what's meant by a deque of deques. It means to use a deque as a container for a collection of deque objects, which would allow for O(1) concatenation of 2 deque objects on either ends. A list of deque objects would not allow O(1) concatenation from the left. – blhsing Mar 05 '20 at 14:21
  • Also, merging two deques can be problematic when `maxlen` is defined for either or both of the deques. Unless you're enforcing the rule that `maxlen` cannot be defined for any of the deque objects being merged, you'll need to define your interpretation of what should happen when merging deques with `maxlen`. – blhsing Mar 05 '20 at 14:29

2 Answers2

4

Nope. deques aren't plain doubly linked lists. They're a linked list of blocks of multiple values (on the CPython reference interpreter, each block can contain up to 64 values), where only the start and end blocks may be incomplete; none of the blocks may be sparse. So it has to fill in the end block of the left hand side, which means the next block gets filled from a mix of two blocks, etc.

Beyond that, since there is no such thing as destructive iteration in Python (nothing language supported anyway), it can't actually transfer the blocks even if the end block of the left, and the start block of the right were full. Copies must happen, block ownership can't be transferred.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Can you expand on that last paragraph ("copies must happen, block ownership can't be transferred")? What does it mean, in terms of observable effects, for a copy to "happen" in Python, as opposed to "not happen"? – Quuxplusone Mar 05 '20 at 14:11
  • "Nope. deques aren't plain doubly linked lists." they might not be doubly linked lists at all either, thought the CPython maintainers decided to do otherwise you can also implement deques using a ring buffer (as e.g. Rust does) or an RRB vector for something more persistent. – Masklinn Mar 05 '20 at 14:13
  • @Masklinn: Ring buffers would only work if the `deque` was bounded; if it isn't, and the buffer fills, insertion would not be guaranteed `O(1)`. Not familiar enough with RRB trees to determine if they'd meet the perf requirements (they might actually be *too* fast for the indexing guarantees), but Python's lack of focus on high performance parallel processing suggests it wouldn't benefit much from them. – ShadowRanger Mar 05 '20 at 14:27
  • @Quuxplusone: In CPython, it means the receiving `deque` must allocate additional blocks, request items from the other `deque` which increments their reference count, then the receiving `deque` stores them in the first available slot in the block. The observable effect is that allocated memory increases, and CPU time is spent on the pointer copies and reference count adjustments. But since it's just copying pointers, the objects being "copied" don't matter (it's not an expensive deep copy or anything). – ShadowRanger Mar 05 '20 at 14:30
  • @ShadowRanger amortised O(1) is generally considered O(1). – Masklinn Mar 05 '20 at 14:48
  • @ShadowRanger: I left my own answer. FWIW, as I come from the C++ world, I don't consider things like "CPU and memory usage" to be "observable side effects." The important observable side effect in this case would actually be "the items are now present in _both_ `x` and `y`; you aren't allowed to clear `y` as a result of `x += y`." – Quuxplusone Mar 05 '20 at 15:01
  • @Masklinn: That's not enough though. If you care enough to reach for a `deque`, you often want *consistent* performance & low memory waste. If you resize frequently, that means regularly incurring reallocation costs (for 10M long `list`s that are consumed "efficiently", on my machine, reallocation on first shrink takes 2 ms; when reallocation doesn't happen, it's usually 60 ns, max 20 μs). If you don't reallocate frequently, you violate `deque`'s claim to be "memory efficient" (`list` reallocation delays enough that draining 10M elements leaves ~33/76 MB unused before it finally reallocates). – ShadowRanger Mar 05 '20 at 17:20
  • @Quuxplusone: Yeah, that's what I led with in my second paragraph ("there is no such thing as destructive iteration in Python"). I figured you were looking for some other side-effect, since I'd already mentioned that it's impossible to transfer the blocks wholesale, so the original `deque` couldn't possibly be emptied. Thus my mentioning the performance effects, separate from the logical effects. – ShadowRanger Mar 05 '20 at 17:23
  • @ShadowRanger If you care enough to reach for a deque it's because you don't want O(n) pushfront/popfront because you either need a FIFO container or will need to pushfront a lot. And the memory efficiency is a specific decision of Python, not a specific characteristic of deques. – Masklinn Mar 05 '20 at 18:13
  • @Masklinn: It's one of the things Python specifically guarantees for `collections.deque`, which is what we're talking about; doesn't matter if deques in general don't need to make that guarantee. – ShadowRanger Mar 05 '20 at 21:54
1

I'm posting this as a separate answer rather than continuing to converse-in-comments under ShadowRanger's good answer. :)

ShadowRanger points out, correctly, that one of deque's invariants is that the blocks in the middle of the list are always 100% full. Therefore, if you had two deques like

X = (. . . 1) (2 3 4 5) (6 7 . .)  [3 blocks, 7 elements]
Y = (8 9 A B) (C D E .)            [2 blocks, 7 elements]

there would be literally no way to concatenate them in O(1) time while preserving order, because deque's invariant does not permit you to express the result as

X+Y = (. . . 1) (2 3 4 5) (6 7 . .) (8 9 A B) (C D E .)  [invalid]

You would have to adjust the positions of all the elements in one or the other of the deques, either like this:

X+Y = (. . . 1) (2 3 4 5) (6 7 8 9) (A B C D) (E . . .)  [adjusted elements 8 thru E]

or like this:

X+Y = (. 1 2 3) (4 5 6 7) (8 9 A B) (C D E .)  [adjusted elements 1 thru 7]

Those are simple pointer swaps, so they're fast; but there are still O(n) of them.


However, suppose you pass in two deques whose alignments just happen to coincide?

X = (. . . 1) (2 3 4 5) (6 7 . .)  [3 blocks, 7 elements]
Y = (. . 8 9) (A B C D) (E . . .)  [3 blocks, 7 elements]

X+Y = (. . . 1) (2 3 4 5) (6 7 8 9) (A B C D) (E . . .)  [can be done in O(1)]

Here, after we manually append items 8 and 9, it does become physically possible to pilfer the entire tail of the right-hand deque in O(1). And deque can detect this possibility in O(1); and manually appending those first few items takes O(block size) = O(1). So yes, it is physically possible for concatenating two deques to be implemented in O(1) time, under special circumstances which do not always hold.

However, you would have to call that operation something other than x += y or x.extend(y). Those two operations are specified not to modify their right-hand operand. The standard deque from collections does not provide any "destructive" operation like that. (But, if it existed, I would expect it to be named splice.)

You can see the implementation of deque's += operator (as implemented in CPython) here.

Federico
  • 1,925
  • 14
  • 19
Quuxplusone
  • 23,928
  • 8
  • 94
  • 159