I decided to turn my set of comments into a proper answer.
First off, let's clarify what's happening when you do:
>>> l = [i for i in range(100000000)]
Here three things are happening:
- 100000000
int
objects are being created. Creating an object in CPython requires allocating memory and placing content into that memory, and this takes time.
- You are running a loop. This affects performances considerably:
[i for i in range(...)]
is much slower than list(range(...))
.
- The large list is being created on the fly.
Reading your question, it seems that you are considering only the last point, ignoring the others. Therefore, your timings are inaccurate: creating a large list does not take 3 seconds, it takes a fraction of those 3 seconds.
How large is this fraction is an interesting question, which is difficult to answer using only Python code, but still we can try. Specifically, I'd try with the following statement:
>>> [None] * 100000000
Here CPython does not have to create a large amount of objects (there's only None
), does not have to run loops and can allocate the memory for the list once (because it knows the size in advance).
Timings are self-explanatory:
$ python3 -m timeit "list(range(100000000))"
10 loops, best of 3: 2.26 sec per loop
$ python3 -m timeit "[None] * 100000000"
10 loops, best of 3: 375 msec per loop
Now, back to your question: how about item deletion?
$ python3 -m timeit --setup "l = [None] * 100000000" "del l[0]"
10 loops, best of 3: 89 msec per loop
$ python3 -m timeit --setup "l = [None] * 100000000" "del l[100000000 // 4]"
10 loops, best of 3: 66.5 msec per loop
$ python3 -m timeit --setup "l = [None] * 100000000" "del l[100000000 // 2]"
10 loops, best of 3: 45.3 msec per loop
These numbers tell us something important. Note that 2 × 45.3 ≈ 89. Also 66.5 × 4 / 3 ≈ 89.
These numbers are telling exactly what linear complexity is about. If a function has time complexity kn (which is O(n)), it means that if we double the input, we double time; if we increase the input size by 4/3, the time increases by 4/3.
And this is what's happening here. In CPython, our list of 100000000 items is a contiguous memory area containing pointers to Python objects:
l = |ptr0|ptr1|ptr2|...|ptr99999999|
When we run del l[0]
we are moving ptr1
from right to left, overwriting ptr0
. The same for other elements:
l = |ptr0|ptr1|ptr2|...|ptr99999999|
^^^^
` item to delete
l = |ptr1|ptr2|...|ptr99999999|
Therefore, when we run del l[0]
we have to move 99999998 pointers to the left. This is different from del l[100000000 // 2]
, which requires moving only half the pointers (the pointers on the first half don't need to be moved). "Moving half the pointers" equals "performing half of the operations" which roughly mean "running in half the time" (this is not always true, but timings say that this is true in this case).