I have a list l
and non-negative integers n
, start0
, start1
.
I need most efficient (fastest, high-speed) way of copying (moving) l[start0 : start0 + n]
to l[start1 : start1 + n]
.
I need a shallow copy (only copying references) for my current case, but would be interesting how to make fast a deep copy of range too.
Is next the most efficient and shortest way?
l[start1 : start1 + n] = l[start0 : start0 + n]
As left and right sides have same range length n
in theory Python can do optimizations by just copying this data without creating any intermediate lists or resizing/re-allocating original list. But does python do such optimizations?
I've tried to do next measurements. I'm creating lists of different lengths while copying just short sub-range, also doing three variants - copying same-length sub-range, shorter sub-range, and longer sub-range. In theory for same length range python can just copy short data, which is very efficient, while for shorter or longer subranges python should move and/or re-allocate whole large array.
# Needs: python -m pip install timerit
import math, timerit
for lsize in range(5, 27, 3):
size = 1 << lsize
for n in [10, 9, 11]:
print(f'size = {str(size).rjust(8)}, n = {str(n).rjust(2)}: ', end = '', flush = True)
for t in timerit.Timerit(num = math.ceil(10 ** 5 / math.sqrt(size)), verbose = 1):
l = [0] * size
with t:
l[10 : 20] = l[0 : n]
outputs:
size = 32, n = 10: Timed best=1.026 µs, mean=1.464 ± 0.2 µs
size = 32, n = 9: Timed best=1.026 µs, mean=1.528 ± 0.2 µs
size = 32, n = 11: Timed best=1.539 µs, mean=1.544 ± 0.1 µs
size = 256, n = 10: Timed best=1.026 µs, mean=1.532 ± 0.1 µs
size = 256, n = 9: Timed best=1.539 µs, mean=1.546 ± 0.1 µs
size = 256, n = 11: Timed best=1.539 µs, mean=2.172 ± 0.2 µs
size = 2048, n = 10: Timed best=1.539 µs, mean=1.585 ± 0.1 µs
size = 2048, n = 9: Timed best=2.565 µs, mean=2.580 ± 0.1 µs
size = 2048, n = 11: Timed best=4.105 µs, mean=4.375 ± 0.3 µs
size = 16384, n = 10: Timed best=2.052 µs, mean=2.125 ± 0.2 µs
size = 16384, n = 9: Timed best=11.802 µs, mean=12.107 ± 0.3 µs
size = 16384, n = 11: Timed best=12.315 µs, mean=12.874 ± 0.7 µs
size = 131072, n = 10: Timed best=3.592 µs, mean=3.907 ± 0.2 µs
size = 131072, n = 9: Timed best=96.988 µs, mean=100.387 ± 1.4 µs
size = 131072, n = 11: Timed best=851.852 µs, mean=873.135 ± 13.3 µs
size = 1048576, n = 10: Timed best=9.750 µs, mean=10.434 ± 0.3 µs
size = 1048576, n = 9: Timed best=1.119 ms, mean=1.127 ± 0.0 ms
size = 1048576, n = 11: Timed best=6.986 ms, mean=7.054 ± 0.0 ms
size = 8388608, n = 10: Timed best=10.263 µs, mean=10.862 ± 0.4 µs
size = 8388608, n = 9: Timed best=9.752 ms, mean=9.813 ± 0.0 ms
size = 8388608, n = 11: Timed best=58.158 ms, mean=58.692 ± 0.4 ms
size = 67108864, n = 10: Timed best=12.829 µs, mean=13.548 ± 0.4 µs
size = 67108864, n = 9: Timed best=78.876 ms, mean=79.061 ± 0.1 ms
size = 67108864, n = 11: Timed best=471.014 ms, mean=473.688 ± 1.9 ms
As we can see indeed for same-length (10) ranges for long lists it takes microseconds to copy, while shorter (9) / longer (11) ranges copying takes milliseconds. Also copying same-length for very short and very long lists take not-very-different times (meaning not that different like thousands times as for case-9 and case-11). Also replacing range with shorter range is significantly faster than replacing range with longer range, because first needs just to copy whole array's data, while second needs also to re-allocate array as it needs a larger array to store new data.
Measurements above I did intentionally just to check that Python is clever enough to not re-create whole array when same-length-to-same-length is copied. Because next non-clever naive implementation of Python's list copying can also be possible - 1) create new temporary array with copy of content of l[0 : start0]
, 2) create new temporary array with copy of content of l[start1 : start1 + n1]
, 3) create new temporary array with copy of content of l[start0 : start0 + n0]
, 4) concatenate these 1)-3) into new large array. Of cause I was almost sure python doesn't do that, but just to re-check I did measurements. And of cause only n
elements are copied only for the case when slice expression on right has same length as on left. If left and right differ in size than half (on average) or whole array data should be copied, as it was shown by measurements of cases 9 and 11.
So seems that Python actually does optimizations for same-length ranges (meaning that whole original list is not re-allocated from scratch, only range values (references) are copied). Any documentational proofs?
Maybe also there is some even more optimal builtin function for doing copy of same-length ranges? Like l.copy_range(start0, n, start1)
. Same like in C++
exists std::copy(vec.begin() + start0, vec.begin() + start0 + n, vec.begin() + start1)
same way Python may have built-in C
-based method for such copying of references.
Also how about step
? For doing copy of [start0 : start0 + n : step]
.
Next is also a way to copy range:
for i in range(n):
l[sart1 + i] = l[sart0 + i]
but probably not as efficient as first method.
So what's the most efficient way to copy a range from one place in list to another place in same list?
PS. BTW, for my specific task I need not even to copy range but to move range from one place to another, i.e. old region can be None-filled. If built-in function for moving region existed in Python then for python it would be just enough to use C
-function like memmove()
and zeroize old region, no need to copy or increment/decrement ref-counts of entries.