Create a list with initial capacity in Python

Question

Code like this often happens:

l = []
while foo:
    # baz
    l.append(bar)
    # qux

This is really slow if you're about to append thousands of elements to your list, as the list will have to be constantly resized to fit the new elements.

In Java, you can create an ArrayList with an initial capacity. If you have some idea how big your list will be, this will be a lot more efficient.

I understand that code like this can often be refactored into a list comprehension. If the for/while loop is very complicated, though, this is unfeasible. Is there an equivalent for us Python programmers?

As far as I know, they are similar to ArrayLists in that they double their size each time. The amortized time of this operation is constant. It isn't as big of a performance hit as you would think. — mmcdole, Nov 22 '08 at 21:11
Perhaps pre-initialization isn't strictly needed for the OP's scenario, but sometimes it definitely is needed: I have a number of pre-indexed items that need to be inserted at a specific index, but they come out of order. I need to grow the list ahead-of-time to avoid IndexErrors. Thanks for this question. — Neil Traft, Apr 14 '12 at 16:40
@Claudiu The accepted answer is misleading. The highest-upvoted comment under it explains why. Would you consider accepting one of the other answers? — Neal Gokli, Aug 31 '18 at 20:35
_If the for/while loop is very complicated, though, this is unfeasible_ -- not necessarily. Most complicated loop bodies are prime candidates for conversion to a function which can then be used in a list comprehension. This practice tends to promote good design by abstracting away complexity. For `while` loops with unclear or nondeterministic termination conditions, `itertools` and generators can rescue the logic back to list comprehension land much of the time. — ggorlen, Nov 08 '20 at 02:57
Pardon my ignorance, but would list comprehension really solve anything? The comprehension generator doesn't a'priori know it length in most cases, so performance-wise you would still need to resize the underlying array dynamically during construction, just like the loop in question. — Marandil, May 28 '21 at 14:02

score 119 · Accepted Answer · edited Nov 27 '21 at 18:17

119

Warning: This answer is contested. See comments.

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

Results. (evaluate each function 144 times and average the duration)

simple append 0.0102
pre-allocate  0.0098

Conclusion. It barely matters.

Premature optimization is the root of all evil.

edited Nov 27 '21 at 18:17

Brent

4,153
4
30
63

answered Nov 22 '08 at 22:02

S.Lott

384,516
81
508
779

fair enough! i did a similar test and found a difference of 1.00 vs 0.8 or so. I guess it doesn't matter so much. – Claudiu Nov 23 '08 at 00:46
I agree 100% with this answer. The only thing I would add is that you may get some lift out of using generator expressions like: list(i for i in xrange(size)) – Jeremy Cantrell Nov 24 '08 at 18:31
24

What if the preallocation method (size*[None]) itself is inefficient? Does the python VM actually allocate the list at once, or grow it gradually, just like the append() would? – haridsv Nov 15 '09 at 03:01
@haridsv: "What if the preallocation method (size*[None]) itself is inefficient?" What does that mean? It's equally efficient. It's not inefficient. It's the same. – S.Lott Nov 15 '09 at 16:39
2

What I was wondering is that if python preallocates the list with the final size or will only grow it on demand (ie., attempt to grow "size" number of times, just like for the regular append()). Just considering all corners... it is simple enough to optimize so it is quite likely that this case is already optimized, but remains an assumption unless someone knows the internals for sure. – haridsv Nov 16 '09 at 20:04
@haridsv: (1) I can't understand what hair you're splitting. The time is the same. I don't know what other "optimization" you could be talking about. (2) The internals are trivially available -- you can read the source at any time. – S.Lott Nov 17 '09 at 00:45
6

@haridsv: I made some test by putting the allocation (`result=size*[None]`) outside the function and can't find a significant difference. By replacing the loop body by `result[i] = i`, I then get 1.13 ms versus 0.62 ms with preallocation. – rafak Jul 09 '10 at 21:52
6

@S.Lott: What haridsv is saying is that if int*list is implemented by growing a list one element at a time, then it's no wonder both implementations shown above take the same time. If this were the case, then it is possible that pre-allocating might be much faster - but we haven't yet found any code to test that case. – Jonathan Hartley Jul 14 '10 at 22:31
@Tartley: "haven't yet found any code to test that case". Baffling. If it can't be expressed in Python, then what are we talking about? – S.Lott Jul 15 '10 at 00:15
11

Hey. It presumably can be expressed in Python, but nobody has yet posted it here. haridsv's point was that we're just assuming 'int * list' doesn't just append to the list item by item. That assumption is probably valid, but haridsv's point was that we should check that. If it wasn't valid, that would explain why the two functions you showed take almost identical times - because under the covers, they are doing exactly the same thing, hence haven't actually tested the subject of this question. Best regards! – Jonathan Hartley Jul 16 '10 at 08:41
181

This isn't valid; you're formatting a string with each iteration, which takes forever relative to what you're trying to test. Additionally, given that 4% can still be significant depending on the situation, and it's an underestimate... – Philip Guin Jun 04 '12 at 01:23
Would it work faster if we already used an instance of what will be the final data type to prefill the list, instead of None? – Antonio Sep 11 '13 at 10:03
3

The other problem with this answer is that the memory usage of both is not the same (sure, asymptotically the same, but potentially twice as much as it needs to be). The real benefit of preallocation is not speed, it's that the thing is exactly as big as it should be. – Neil Traft Mar 30 '14 at 20:15
59

As @Philip points out the conclusion here is misleading. Preallocation doesn't matter here because the string formatting operation is expensive. I tested with a cheap operation in the loop and found preallocating is almost twice as fast. – Keith Aug 26 '15 at 15:09
7

This answer should be adjusted to address the comments about writing strings inside the loop. Maybe I should say something that is supposed to be witty as is in this answer... "Incorrect testing of optimizations is the most efficient way zero credibility." – GaTechThomas Mar 20 '18 at 16:01
@Keith, Theoretical computer science backs you up here. Depending on the implementation of append, it should be 2 to 3 times faster. https://cs.stackexchange.com/questions/9380/why-is-push-back-in-c-vectors-constant-amortized . – Hashimoto Dec 10 '18 at 09:40
43

Wrong answers with many upvotes are yet another root of all evil. – Hashimoto Dec 10 '18 at 09:48
9

Just to be clear, despite the upvotes this answer is most definitely incorrect. – Christopher Barber Jun 22 '19 at 14:31
1

As said above, this answer is _completely_ wrong. see https://stackoverflow.com/a/7733316/2157240 for a correct profiling – Shmil The Cat Jan 10 '20 at 22:37
@Philip It's less than half a millisecond in an interpreter known for being too slow for that magnitude of time to matter. You wouldn't be writing Python code if that level of performance was critical. – jpmc26 Feb 09 '20 at 00:27
2

Who said it was pre-mature optimization? Tell this to any game developer who can anticipate where the 80/20 pressure points will be – Paul Renton Feb 23 '20 at 11:13
1

As expected, the accepted answer is both subjectively *and* objectively harmful. What's unexpected is that S.Lott – a *very* active Pythonista here – failed to redact his or her spurious claims, preferring instead to push a baseless agenda of "Premature optimization is the root of all evil." **Welcome to 2020,** where even failure is institutionalized. – Cecil Curry Jul 07 '20 at 05:37
Using timeit on Python 3.8.5 and removing the string formatting I got **5.68 ms** (pre-allocate) vs **8.26 ms** (append). Almost 50% improvement, not too shabby! – wim Oct 06 '20 at 21:16
@CecilCurry same goes for the equally active OP who seems unwilling to change the accepted answer. I smell collusion! (joke) But seriously though, guys, why keep unwitting SO users in the dark? – Milo Wielondek Dec 02 '20 at 18:55
There is a hugely important use case for this: you receive results from worker threads in arbitrary order, but you must collect them in some logical order. It can be done by appending a record consisting of desired offset and payload, then afterwards sorting by desired offset and extracting the payload column, but that is cumbersome and incurs sorting overhead. It is better to preallocate the list and put each payload immediately at the correct offset. – Szczepan Hołyszewski Nov 21 '22 at 20:34
Wouldn't it be better to update the answer, or delete it completely, rather than just say to read the comments? With these kinds of answers, it gets incredible hard to navigate SO for useful information. – Jeppe Nov 28 '22 at 09:09

score 97 · Answer 2 · answered Nov 22 '08 at 21:07

97

Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:

l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
    # baz
    l[i] = bar
    # qux

Perhaps you could avoid the list by using a generator instead:

def my_things():
    while foo:
        #baz
        yield bar
        #qux

for thing in my_things():
    # do something with thing

This way, the list isn't every stored all in memory at all, merely generated as needed.

answered Nov 22 '08 at 21:07

Ned Batchelder

364,293
75
561
662

9

+1 Generators instead of lists. Many algorithms can be revised slightly to work with generators instead of full-materialized lists. – S.Lott Nov 22 '08 at 21:23
generators are a good idea, true. i was wanting a general way to do it besides the setting in-place. i guess the difference is minor, thoguh. – Claudiu Nov 23 '08 at 00:57

score 69 · Answer 3 · edited Dec 21 '20 at 02:22

Short version: use

pre_allocated_list = [None] * size

to preallocate a list (that is, to be able to address 'size' elements of the list instead of gradually forming the list by appending). This operation is very fast, even on big lists. Allocating new objects that will be later assigned to list elements will take much longer and will be the bottleneck in your program, performance-wise.

Long version:

I think that initialization time should be taken into account.

Since in Python everything is a reference, it doesn't matter whether you set each element into None or some string - either way it's only a reference. Though it will take longer if you want to create a new object for each element to reference.

For Python 3.2:

import time
import copy

def print_timing (func):
  def wrapper (*arg):
    t1 = time.time()
    res = func (*arg)
    t2 = time.time ()
    print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))
    return res

  return wrapper

@print_timing
def prealloc_array (size, init = None, cp = True, cpmethod = copy.deepcopy, cpargs = (), use_num = False):
  result = [None] * size
  if init is not None:
    if cp:
      for i in range (size):
          result[i] = init
    else:
      if use_num:
        for i in range (size):
            result[i] = cpmethod (i)
      else:
        for i in range (size):
            result[i] = cpmethod (cpargs)
  return result

@print_timing
def prealloc_array_by_appending (size):
  result = []
  for i in range (size):
    result.append (None)
  return result

@print_timing
def prealloc_array_by_extending (size):
  result = []
  none_list = [None]
  for i in range (size):
    result.extend (none_list)
  return result

def main ():
  n = 1000000
  x = prealloc_array_by_appending(n)
  y = prealloc_array_by_extending(n)
  a = prealloc_array(n, None)
  b = prealloc_array(n, "content", True)
  c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)
  d = prealloc_array(n, "content", False, "some object {}".format, None, True)
  e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)
  f = prealloc_array(n, "content", False, copy.deepcopy, (), False)
  g = prealloc_array(n, "content", False, copy.deepcopy, [], False)

  print ("x[5] = {}".format (x[5]))
  print ("y[5] = {}".format (y[5]))
  print ("a[5] = {}".format (a[5]))
  print ("b[5] = {}".format (b[5]))
  print ("c[5] = {}".format (c[5]))
  print ("d[5] = {}".format (d[5]))
  print ("e[5] = {}".format (e[5]))
  print ("f[5] = {}".format (f[5]))
  print ("g[5] = {}".format (g[5]))

if __name__ == '__main__':
  main()

Evaluation:

prealloc_array_by_appending took 118.00003051757812 ms
prealloc_array_by_extending took 102.99992561340332 ms
prealloc_array took 3.000020980834961 ms
prealloc_array took 49.00002479553223 ms
prealloc_array took 316.9999122619629 ms
prealloc_array took 473.00004959106445 ms
prealloc_array took 1677.9999732971191 ms
prealloc_array took 2729.999780654907 ms
prealloc_array took 3001.999855041504 ms
x[5] = None
y[5] = None
a[5] = None
b[5] = content
c[5] = some object blah
d[5] = some object 5
e[5] = a
f[5] = []
g[5] = ()

As you can see, just making a big list of references to the same None object takes very little time.

Prepending or extending takes longer (I didn't average anything, but after running this a few times I can tell you that extending and appending take roughly the same time).

Allocating new object for each element - that is what takes the most time. And S.Lott's answer does that - formats a new string every time. Which is not strictly required - if you want to preallocate some space, just make a list of None, then assign data to list elements at will. Either way it takes more time to generate data than to append/extend a list, whether you generate it while creating the list, or after that. But if you want a sparsely-populated list, then starting with a list of None is definitely faster.

hmm interesting. so the answer mite be - it doesnt really matter if you're doing any operation to put elements in a list, but if you really just want a big list of all the same element you should use the `[]*` approach — Claudiu, Apr 04 '11 at 13:29
As an un-fun aside, this has interesting behavior when done to lists (e.g. to preallocate a `m * n` matrix): `x = 3 * [3 *[0]]` gives `[[0, 0, 0], [0, 0, 0], [0, 0, 0]]`, but then assignment is wonky: `x[0][0] = 1` gives `[[1, 0, 0], [1, 0, 0], [1, 0, 0]]`. — kingb12, Sep 24 '20 at 05:41
Yes, because `x = 3 * [3 *[0]]` only allocates two lists. See [this canonical post](https://stackoverflow.com/questions/240178/list-of-lists-changes-reflected-across-sublists-unexpectedly) on the issue. — ggorlen, Nov 08 '20 at 02:52

score 39 · Answer 4 · edited Dec 21 '20 at 02:30

The Pythonic way for this is:

x = [None] * numElements

Or whatever default value you wish to prepopulate with, e.g.

bottles = [Beer()] * 99
sea = [Fish()] * many
vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche

(Caveat Emptor: The [Beer()] * 99 syntax creates one Beer and then populates an array with 99 references to the same single instance)

Python's default approach can be pretty efficient, although that efficiency decays as you increase the number of elements.

Compare

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # Millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    result = []
    i = 0
    while i < Elements:
        result.append(i)
        i += 1

def doAllocate():
    result = [None] * Elements
    i = 0
    while i < Elements:
        result[i] = i
        i += 1

def doGenerator():
    return list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        x = 0
        while x < Iterations:
            fn()
            x += 1


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

with

#include <vector>
typedef std::vector<unsigned int> Vec;

static const unsigned int Elements = 100000;
static const unsigned int Iterations = 144;

void doAppend()
{
    Vec v;
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doReserve()
{
    Vec v;
    v.reserve(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doAllocate()
{
    Vec v;
    v.resize(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v[i] = i;
    }
}

#include <iostream>
#include <chrono>
using namespace std;

void test(const char* name, void(*fn)(void))
{
    cout << name << ": ";

    auto start = chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < Iterations; ++i) {
        fn();
    }
    auto end = chrono::high_resolution_clock::now();

    auto elapsed = end - start;
    cout << chrono::duration<double, milli>(elapsed).count() << "ms\n";
}

int main()
{
    cout << "Elements: " << Elements << ", Iterations: " << Iterations << '\n';

    test("doAppend", doAppend);
    test("doReserve", doReserve);
    test("doAllocate", doAllocate);
}

On my Windows 7 Core i7, 64-bit Python gives

Elements: 100000, Iterations: 144
doAppend: 3587.204933ms
doAllocate: 2701.154947ms
doGenerator: 1721.098185ms

While C++ gives (built with Microsoft Visual C++, 64-bit, optimizations enabled)

Elements: 100000, Iterations: 144
doAppend: 74.0042ms
doReserve: 27.0015ms
doAllocate: 5.0003ms

C++ debug build produces:

Elements: 100000, Iterations: 144
doAppend: 2166.12ms
doReserve: 2082.12ms
doAllocate: 273.016ms

The point here is that with Python you can achieve a 7-8% performance improvement, and if you think you're writing a high-performance application (or if you're writing something that is used in a web service or something) then that isn't to be sniffed at, but you may need to rethink your choice of language.

Also, the Python code here isn't really Python code. Switching to truly Pythonesque code here gives better performance:

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    for x in range(Iterations):
        result = []
        for i in range(Elements):
            result.append(i)

def doAllocate():
    for x in range(Iterations):
        result = [None] * Elements
        for i in range(Elements):
            result[i] = i

def doGenerator():
    for x in range(Iterations):
        result = list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        fn()


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

Which gives

Elements: 100000, Iterations: 144
doAppend: 2153.122902ms
doAllocate: 1346.076965ms
doGenerator: 1614.092112ms

(in 32-bit, doGenerator does better than doAllocate).

Here the gap between doAppend and doAllocate is significantly larger.

Obviously, the differences here really only apply if you are doing this more than a handful of times or if you are doing this on a heavily loaded system where those numbers are going to get scaled out by orders of magnitude, or if you are dealing with considerably larger lists.

The point here: Do it the Pythonic way for the best performance.

But if you are worrying about general, high-level performance, Python is the wrong language. The most fundamental problem being that Python function calls has traditionally been up to 300x slower than other languages due to Python features like decorators, etc. (PythonSpeed/PerformanceTips, Data Aggregation).

*Python* has `timeit`, which you should use when timing your Python code; I'm not talking about C++, obviously. — Nils von Barth, May 08 '15 at 03:33
This isn't correct answer. `bottles = [Beer()] * 99` does not create 99 Beer objects. Instead, creates one Beer object with 99 references to it. If you'd mutate it, all elements on the list will be mutated, cause `(bottles[i] is bootles[j]) == True` for every `i != j. 0<= i, j <= 99`. — erhesto, Feb 04 '18 at 17:03
@erhesto You judged the answer as not correct, because the author used references as an example to fill a list? First, no one is requiring to create 99 Beer objects (as versus one object and 99 references). In the case of prepopulation (what he talked about), faster is better, as the value will be replaced later. Second, the answer is not about references or mutation at all. You are missing the big picture. — Yongwei Wu, Nov 29 '18 at 09:17
@YongweiWu You're right actually right. This example doesn't make whole answer incorrect, it might be just misleading and it's simply worth to mention. — erhesto, Nov 30 '18 at 09:49

score 11 · Answer 5 · edited Dec 21 '20 at 02:38

As others have mentioned, the simplest way to preseed a list is with NoneType objects.

That being said, you should understand the way Python lists actually work before deciding this is necessary.

In the CPython implementation of a list, the underlying array is always created with overhead room, in progressively larger sizes ( 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120, etc), so that resizing the list does not happen nearly so often.

Because of this behavior, most list.append() functions are O(1) complexity for appends, only having increased complexity when crossing one of these boundaries, at which point the complexity will be O(n). This behavior is what leads to the minimal increase in execution time in S.Lott's answer.

Source: Python list implementation

score 5 · Answer 6 · edited Dec 21 '20 at 02:23

I ran S.Lott's code and produced the same 10% performance increase by preallocating. I tried Ned Batchelder's idea using a generator and was able to see the performance of the generator better than that of the doAllocate. For my project the 10% improvement matters, so thanks to everyone as this helps a bunch.

def doAppend(size=10000):
    result = []
    for i in range(size):
        message = "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate(size=10000):
    result = size*[None]
    for i in range(size):
        message = "some unique object %d" % ( i, )
        result[i] = message
    return result

def doGen(size=10000):
    return list("some unique object %d" % ( i, ) for i in xrange(size))

size = 1000
@print_timing
def testAppend():
    for i in xrange(size):
        doAppend()

@print_timing
def testAlloc():
    for i in xrange(size):
        doAllocate()

@print_timing
def testGen():
    for i in xrange(size):
        doGen()


testAppend()
testAlloc()
testGen()

Output

testAppend took 14440.000ms
testAlloc took 13580.000ms
testGen took 13430.000ms

"For my proj the 10% improvement matters"? Really? You can **prove** that list allocation is **the** bottleneck? I'd like to see more on that. Do you have a blog where you could explain how this actually helped? — S.Lott, Sep 28 '10 at 10:17
@S.Lott try bumping the size up by an order of magnitude; performance drops by 3 orders of magnitude (compared to C++ where performance drops by slightly more than a single order of magnitude). — kfsone, Jun 11 '14 at 21:14
This could be the case because as an array grows, it might have to be moved around in memory. (Think of how objects are stored there one after the other.) — Evgeni Sergeev, May 30 '15 at 12:17

score 5 · Answer 7 · edited Dec 21 '20 at 02:35

5

Concerns about preallocation in Python arise if you're working with NumPy, which has more C-like arrays. In this instance, preallocation concerns are about the shape of the data and the default value.

Consider NumPy if you're doing numerical computation on massive lists and want performance.

edited Dec 21 '20 at 02:35

Peter Mortensen

30,738
21
105
131

answered Jul 30 '14 at 02:22

J450n

51
1
1

score 1 · Answer 8 · answered Sep 02 '22 at 14:07

Python's list doesn't support preallocation. Numpy allows you to preallocate memory, but in practice it doesn't seem to be worth it if your goal is to speed up the program.

This test simply writes an integer into the list, but in a real application you'd likely do more complicated things per iteration, which further reduces the importance of the memory allocation.

import timeit
import numpy as np

def list_append(size=1_000_000):
    result = []
    for i in range(size):
        result.append(i)
    return result

def list_prealloc(size=1_000_000):
    result = [None] * size
    for i in range(size):
        result[i] = i
    return result

def numpy_prealloc(size=1_000_000):
    result = np.empty(size, np.int32)
    for i in range(size):
        result[i] = i
    return result

setup = 'from __main__ import list_append, list_prealloc, numpy_prealloc'
print(timeit.timeit('list_append()', setup=setup, number=10))     # 0.79
print(timeit.timeit('list_prealloc()', setup=setup, number=10))   # 0.62
print(timeit.timeit('numpy_prealloc()', setup=setup, number=10))  # 0.73

Josiah Yoder · Answer 9 · 2016-10-27T16:57:34.070

For some applications, a dictionary may be what you are looking for. For example, in the find_totient method, I found it more convenient to use a dictionary since I didn't have a zero index.

def totient(n):
    totient = 0

    if n == 1:
        totient = 1
    else:
        for i in range(1, n):
            if math.gcd(i, n) == 1:
                totient += 1
    return totient

def find_totients(max):
    totients = dict()
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

This problem could also be solved with a preallocated list:

def find_totients(max):
    totients = None*(max+1)
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

I feel that this is not as elegant and prone to bugs because I'm storing None which could throw an exception if I accidentally use them wrong, and because I need to think about edge cases that the map lets me avoid.

It's true the dictionary won't be as efficient, but as others have commented, small differences in speed are not always worth significant maintenance hazards.

score 0 · Answer 10 · answered Dec 31 '22 at 16:32

Fastest Way - use * like list1 = [False] * 1_000_000

Comparing all the common methods (list appending vs preallocation vs for vs while), I found that using * gives the most efficient execution time.

import time

large_int = 10_000_000
start_time = time.time()

# Test 1: List comprehension
l1 = [False for _ in range(large_int)]
end_time_1 = time.time()

# Test 2: Using *
l2 = [False] * large_int
end_time_2 = time.time()

# Test 3: Using append with for loop & range
l3 = []
for _ in range(large_int):
    l3.append(False)
end_time_3 = time.time()

# Test 4: Using append with while loop
l4, i = [], 0
while i < large_int:
    l4.append(False)
    i += 1
end_time_4 = time.time()

# Results
diff_1 = end_time_1 - start_time
diff_2 = end_time_2 - end_time_1
diff_3 = end_time_3 - end_time_2
diff_4 = end_time_4 - end_time_3
print(f"Test 1. {diff_1:.4f} seconds")
print(f"Test 2. {diff_2:.4f} seconds")
print(f"Test 3. {diff_3:.4f} seconds")
print(f"Test 4. {diff_4:.4f} seconds")

print("\nTest 2 is faster than - ")
print(f"            Test 1 by - {(diff_1 / diff_2 * 100 - 1):,.0f}%")
print(f"            Test 3 by - {(diff_3 / diff_2 * 100 - 1):,.0f}%")
print(f"            Test 4 by - {(diff_4 / diff_2 * 100 - 1):,.0f}%")

score -3 · Answer 11 · edited Dec 21 '20 at 02:08

-3

From what I understand, Python lists are already quite similar to ArrayLists. But if you want to tweak those parameters I found this post on the Internet that may be interesting (basically, just create your own ScalableList extension):

http://mail.python.org/pipermail/python-list/2000-May/035082.html

edited Dec 21 '20 at 02:08

Peter Mortensen

30,738
21
105
131

answered Nov 22 '08 at 21:07

Piotr Lesnicki

9,442
2
28
26

1

The link is broken: *"Not Found. The requested URL /pipermail/python-list/2000-May/035082.html was not found on this server."* – Peter Mortensen Dec 21 '20 at 02:09

Create a list with initial capacity in Python

11 Answers11

Warning: This answer is contested. See comments.

Output

Fastest Way - use * like list1 = [False] * 1_000_000

Linked

Related