Python: A list that doesn't get relocated

Question

I'm trying to optimize an algorithm in Python purely for fun / curiosity. I have a list that I'm constantly adding items and removing items from. I know that the way Python lists are implemented, Python will relocate the list in memory for you depending on its size. For example, if you have a list with 10 members, the 10 pointers will be stored consecutively in memory, but there may not be room for 100 consecutive pointers, since another program might be taking a memory block that's in the way. So as you add more members to the list, at some point Python will relocate the entire list to a different place in memory where there's more room for the list to expand.

I'm curious to know whether there is a custom data structure in Python that behaves like a list, but allows me to avoid the performance cost of doing a relocation. I'm expecting that this data type will ask me, in advance, how many members I foresee it will have, and then it'll allocate a big contiguous space in memory so it won't need to relocate the list as it slowly grows to the number of members I specified.

(Note: I tried using a Numpy array, but I had to keep a separate "needle" variable that kept the size of the list, and I'm afraid that the overhead of maintaining that needle in Python cost more than the gain.)

Not going to close this as a dupe yet, but you should see the answer to this similar question that shows that you're optimizing prematurely: http://stackoverflow.com/questions/311775/python-create-a-list-with-initial-capacity — bgporter, May 06 '16 at 12:55
If you're only adding/removing from the ends, [deque](https://docs.python.org/3.5/library/collections.html#collections.deque) is an option: "Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction." — Anonymous, May 06 '16 at 12:56
@Anonymous deque is a great option, and currently my algorithm that uses deque is the top performer. Regardless, I'm still curious to get an answer about a non-relocating list. — Ram Rachum, May 06 '16 at 12:58
Appreciate the highlight on 'purely for fun' so you don't get lots of yelling :). Question - have you been able to measure the performance of this relocation in any way, to give yourself a benchmark? With no reason for thinking this, I thought relocations generally double-d the size of the available slots, giving you `log n` growth. Again - not an answer to your question; just wondering if that's relevant/if you have measured it. — dwanderson, May 06 '16 at 12:58
@dwanderson Haven't measured. That'll be really interesting. I see that the answer here http://stackoverflow.com/a/311833/76701 has it measured but I wish I had a way to measure it in my algorithm, because maybe it's different there. — Ram Rachum, May 06 '16 at 13:00
Ah, I was just thinking of a "pre-allocation" a la `items = [None] * size`, but that's represented in that link. Even if it's `log n`, I wouldn't expect `n=10000` to be a great indicator; I'd be mroe intrigued with `n=10**9` or better yet, comparisons across multiple orders of magnitude. +1 internet karma for whoever does that and posts the results — dwanderson, May 06 '16 at 13:03
@dwanderson The problem with pre-allocating is that you can no longer use `len(items)` in the algorithm, you have to keep a needle variable that tracks the length of the list, and that can throw off your measurement. I specified that in the question ;) — Ram Rachum, May 06 '16 at 13:06
Sounds like you might want a (B-)tree structured list. A little slower but no need for relocation. There is a existing library for it. — Dan D., May 06 '16 at 13:25
@DanD. I tried it but it's slower :( Possibly there's more overhead because the b-list is split to multiple lists, which is something I don't need. — Ram Rachum, May 06 '16 at 13:29

score 1 · Answer 1 · answered May 06 '16 at 13:59

Just for the curiosity I implemented simple benchmark to test various methods mentioned here. As people predicted the differences are quite small. That said for larger data sets pre-allocating the list and keeping track of the size is faster but this of course is only the case when adding/removing items at the end.

Benchmark

import time
from collections import deque

def normal(size):
    res = []
    for x in xrange(size):
        res.append(x)

def prealloc(size):
    res = [0] * size
    len = 0 # Separate length tracking
    for x in xrange(size):
        res[len] = x
        len += 1

def deq(size):
    res = deque()
    for x in xrange(size):
        res.append(x)

FUNCS = [
    ['list', normal],
    ['pre-allocate', prealloc],
    ['deque', deq]
]

size = 10
while size <= 10000000:
    print 'size: {0}'.format(size)
    for n, f in FUNCS:
        start = time.clock()
        for i in xrange(30):
            f(size)
        t = time.clock() - start
        print '{}: {}'.format(n, (t / 30))
    size *= 100
    print ''

Results

size: 10
list: 2.13049681786e-06
pre-allocate: 2.03719038788e-06
deque: 2.00608824456e-06

size: 1000
list: 0.000104425446219
pre-allocate: 8.72415120307e-05
deque: 0.000106369330176

size: 100000
list: 0.00984092031242
pre-allocate: 0.00825001457913
deque: 0.0101434508606

size: 10000000
list: 1.23171166758
pre-allocate: 0.876017370547
deque: 1.05051393959

Tests were run with Python 2.7.10 on Windows 8.

Python: A list that doesn't get relocated

1 Answers1