Why is tuple slicing `O(n)` in python?

Question

In [1]: x = tuple(range(0,5000000))                                                                                                                                                                                

In [2]: %timeit x[0:200000]                                                                                                                                                                                        
782 µs ± 8.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: %timeit x[0:200]                                                                                                                                                                                           
350 ns ± 4.43 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

So... it looks like larger tuple slices means more time spent, and therefore tuple are a poor datastructure for operations like: (head, tail) = t[0], t[1:]. (Also see Tim Peters: https://mail.python.org/pipermail/python-list/2003-May/225347.html)

But because tuples are immutable, I would have thought slices could be implemented to reference the data of the original tuple, instead of internally copying the list of data references, as appears to happen. So, what's up what that? Is there a reason they're implemented the slow way?

If I understand your question correctly, whether slicing handles tuples as references or values has no bearing on its Big O. — Robert Harvey, Mar 06 '20 at 22:24
@RobertHarvey It would be O(1) if they used references, if I understand OP correctly. In any case, if one looks for performance, Python is not really the way to go. — Acorn, Mar 06 '20 at 22:25
If it touches every element, it's O(n) regardless of the way each element is touched. — Robert Harvey, Mar 06 '20 at 22:26
Maybe dupe: [Why are slices in Python 3 still copies and not views?](https://stackoverflow.com/q/6902235/674039) — wim, Mar 06 '20 at 22:26
"But because tuples are immutable, I would have thought slices could be implemented to reference the data of the original tuple" - they could, but then you'd be keeping the whole original tuple alive, and trying to fix the resulting memory bugs would be a mess. — user2357112, Mar 06 '20 at 22:26
@user2357112supportsMonica C++ has the notion of a smart pointer that neatly handles this. Numpy does something similar it under the hood. Why not tuples? — user48956, Mar 06 '20 at 22:27
tuple slices create shallow copy, by design. See the links I referenced for the "why". — wim, Mar 06 '20 at 22:30
@user48956: C++ smart pointers don't fix this problem. NumPy has views, but it also offers rich functionality for introspecting and manipulating views, and crucially, making non-view copies of an array. Adding all that to tuples would bloat the interface and bloat the tuple struct without much benefit - advanced view use cases are pervasive throughout NumPy, but not so common with tuples. — user2357112, Mar 06 '20 at 22:31
If you were concerned about memory, your could always: `tuple(t[1:])`. — user48956, Mar 06 '20 at 22:31
The tuple interface wouldn't change, only the implementation. — user48956, Mar 06 '20 at 22:32
This is fundamentally a question about the design decision, and I don't think you're going to get a more satisfactory answer than "because the implementers didn't consider that a useful thing to implement for the use-cases they had in mind". — juanpa.arrivillaga, Mar 06 '20 at 22:32
When I say "memory bugs", I don't mean double frees or missing frees or anything like that. I mean trying to hunt down the place in your code where you kept a reference to `t[:2]` where `t[6]` was a 6 gigabyte array. — user2357112, Mar 06 '20 at 22:33
In Java, substring operations used to be O(1) and use shared memory. Because of the memory leaks it caused, they switched to having substrings perform a copy of part of the data, so that the longer data wouldn't be kept from garbage collection. I guess it is the same rationale. — khelwood, Mar 06 '20 at 22:34
@khelwood good point, although on the other hand, Go slices of arrays create views, but that is an explicit feature, so at that point I guess it is the programmers responsibility to keep that in mind. — juanpa.arrivillaga, Mar 06 '20 at 22:43
@user48956 that's what `numpy` and Go arrays do. It's not *that nuts* if you understand what you are doing. It is very useful in `numpy`. In any case,`tuple` doesn't sound like it was designed for the use-case you have in mind, which to me seems like you want a cons list. I know Scala lists behave this way, and obviously, Scheme/Lisp — juanpa.arrivillaga, Mar 06 '20 at 22:47
Slicing for assignment is useful. Its what to do with a dangling slice that you allocated 50 lines ago on a mutable structure that can be resized (or reshaped) -- its conceptually ambiguous - `Understanding what you are doing' needs a mental model the implementation here. This is less true for immutable datastructures. — user48956, Mar 06 '20 at 22:56
[What is the rationale for closing "why" questions on language design?](https://meta.stackexchange.com/a/170415/144918) — Charles Duffy, Mar 06 '20 at 23:00
Small tuples are widely used in python, such as in passing arguments to and from functions (almost any there are commas). Extra code and attributes that would optimize large slices could harm the more common uses. — hpaulj, Mar 07 '20 at 00:40
A 1d object dtype `ndarray` can approximate a tuple with view slicing. — hpaulj, Mar 07 '20 at 02:37

Why is tuple slicing `O(n)` in python?

0 Answers0