10

I am looking for the Python equivalent of the C++ vector::reserve(). I don't know how big the list is going to be ahead of time, but I know it's going to be fairly large, and I want to avoid as many resizings as possible as the list is being grown inside a deep inner loop.

The only solution I've come up with so far is very cumbersome compared to the vector::reserve() idiom. That solution is to pre-create the list using [None]*K, tracking the size of the list in a separate counter, appending or setting items into the list as need be, then copying a slice of the list once it's fully constructed. Is there any alternative?

Andrew Prock
  • 6,900
  • 6
  • 40
  • 60
  • 4
    Python lists are built to have fast appends. I don't think you're going to save anything over simply appending to the list inside a loop with your approach. – Joe Kington Oct 11 '11 at 18:35
  • 3
    Here are some suggestions: http://stackoverflow.com/questions/537086/reserve-memory-for-list-in-python – Di Zou Oct 11 '11 at 18:36
  • 2
    Python doesn't resize after every `append`, it does it periodically, which seems to be what you're trying to do. Let Python do it for you. – agf Oct 11 '11 at 18:38
  • Python lists are more like C++ lists or deques than vectors in that there's no requirement for the elements to be in contiguous memory, since there are no pointers. – Mike DeSimone Oct 11 '11 at 18:38
  • Note that the benchmark in Di Zou's question assumes that you know the full size beforehand and thus can avoid all resizing (and even then it's a relatively small difference). In your case, the resizing that happens internally will likely beat manual (read: Python-level) resizing any day of the week. It's only when can avoid *any* reallocation that it makes sense. –  Oct 11 '11 at 18:39
  • 9
    The usual replacement is `# TODO - figure out how to pre-size the list if it turns out to be a performance bottleneck` – Russell Borogove Oct 11 '11 at 18:40
  • 3
    @MikeDeSimone: That's wrong. Plain and simply wrong. There are *only* pointers (to `struct PyObject`), and those are stored sequentially and need to be copied during resizing. The objects don't need tobe copied, true, but that's more because they're reference types anyway. What Python calls a `list` is actually pretty close to a `std::vector`. It it was a linked list or deque, it would have quite different performance characteristics for many operations (like accessing an item in the middle or appending at the beginning). –  Oct 11 '11 at 18:41
  • @MikeDeSimone - in the CPython implementation, list is in fact a contiguous block of references/pointers to Python objects, and resizing time may be an issue, though the time needed to construct the individual objects in the list usually swamps that. – Russell Borogove Oct 11 '11 at 18:42
  • 2
    OK, sorry, I thought we were talking about Python by itself, not CPython bindings and internals. What I meant was that the `[]` operation in Python does not require one parameter to be a pointer as it does in C, and there are no unary `*` or `&` operators for creating or dereferencing pointers, because Python (as in the code you put in a `.py` file) doesn't have pointers. ... I need to get better about phrasing things more clearly so people don't misinterpret them into something scandalous. – Mike DeSimone Oct 11 '11 at 18:53
  • We *are* talking about Python ;) CPython's implementation is just an example, in many things (including those mentioned here) can only be implemented sensible in roughly the way CPython does it. I see what you're saying now - though there's still a few details I could nitpick about ;) - but I don't see how it is relevant to the question. That is, unless you tried to state that resizing a list won't copy the objects contained in there but only references to them. But I take that as given - Python objects are *always* reference types (as in, can only be accessed through references/pointers). –  Oct 11 '11 at 19:06
  • Just how large does this list get? In general how long is the program taking? How long is acceptable? – Steven Rumbalski Oct 11 '11 at 20:05
  • If you want to do this kind of optimization, Python might not be the language for you, to be perfectly honest. Or at least, you could deal with the data through something like NumPy. – Karl Knechtel Oct 11 '11 at 22:21
  • possible duplicate of [Python - Create a list with initial capacity](http://stackoverflow.com/questions/311775/python-create-a-list-with-initial-capacity) – Nils von Barth May 06 '15 at 04:33
  • @NilsvonBarth, reserve() != initial capacity. – Andrew Prock May 07 '15 at 03:33
  • @AndrewProck, in this case they're almost identical, since you only need one initial call, and the basic question is identical: both questions want a list with enough capacity to hold the expected elements, to avoid resizings: “I want to avoid as many resizings as possible”. Formally, `[std::vector::reserve](http://www.cplusplus.com/reference/vector/vector/reserve/)` “Request a change in capacity: Requests that the vector capacity be at least enough to contain n elements.” If you need other features (repeated manual resizing, say), please state them in the question. – Nils von Barth May 08 '15 at 03:47
  • @NilsvonBarth I appreciate your attempts to help, but marking a duplicate when it's not is not constructive. – Andrew Prock May 08 '15 at 19:18
  • @AndrewProck What else are you asking for? There is no reserve in Python. For initial capacity, see other question. Done. – Nils von Barth May 09 '15 at 19:35

3 Answers3

8

For the heck of it, I did some performance testing:

def foo(n):
  x = []
  for y in xrange(n): x.append(y)

def bar(n):
  x = [None] * n
  for y in xrange(n): x[y] = y

def baz(n):
  # This is obviously silly; we could just do range(n)
  # but this way makes for a fairer test
  x = [y for y in xrange(n)]

>>> timeit.timeit(lambda:foo(1000000), number=10)
1.761765391970215
>>> timeit.timeit(lambda:bar(1000000), number=10)
0.79829286962450396
>>> timeit.timeit(lambda:baz(1000000), number=10)
0.9904259479906159
>>> timeit.timeit(lambda:foo(10000), number=1000)
1.3354106457664443
>>> timeit.timeit(lambda:bar(10000), number=1000)
0.70596751821813086
>>> timeit.timeit(lambda:baz(10000), number=1000)
0.58049759117432131
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
7

Similarly to std::vector, CPython's list already preallocates more elements than is required and then grows that allocated space in a manner that gives O(1) amortized appends. I would therefore leave it at that until I could prove through profiling this is indeed a bottleneck.

edit: You mention in the comments that you've already done the profiling. In such a case, pre-allocating [None]*n might be a sensible thing to try to see if it's really the repeated reallocations that are the bottleneck.

If your array is numeric, I would encourage you to take a look at NumPy.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 6
    @drewster You should add profiling results to your question, and explain why you think it's growing the list that is the bottleneck, and not just the rest of the `append` operation. – agf Oct 11 '11 at 19:01
3

To put it plainly, you can create a pre-sized list by the following:

lst = [None] * N # where N = size of list

Of course, you'd have to set each index manually (as opposed to append()) and keep track of the last index you used.

If you should do that, however, is another question (which our comrades have done a good job of answering).

Derek Springer
  • 2,666
  • 1
  • 14
  • 12