284

Right now I am using a list, and was expecting something like:

verts = list (1000)

Should I use array instead?

Joan Venge
  • 315,713
  • 212
  • 479
  • 689
  • Initialize a collection with a predefined number of elements. – Joan Venge Feb 06 '09 at 19:15
  • Why? Do you have to set elements at random positions? – Torsten Marek Feb 06 '09 at 19:34
  • Knowing the size doesn't help. We usually use dictionaries for this kind of thing and don't waste time creating large, empty structures. – S.Lott Feb 06 '09 at 19:38
  • 10
    Why? I have an collection of items where order is important. Do you guys know the answer to how it's done. Steve's reply seems like the only way. – Joan Venge Feb 06 '09 at 21:04
  • 42
    I am surprised (and feeling a bit sorry for @JoanVenge) by the number comments which are straying all over the place. In my opinion a standard answer should first include how to accomplish a task (howsoever ridiculous it may be) and then admonish/advice the user for the question. It just seems pointless. Questioning the validity of the question can be questioned. – Shashank Sawant Apr 12 '14 at 00:52
  • 15
    @ShashankSawant: Welcome to SO. – Joan Venge Apr 12 '14 at 16:05
  • I have a use case where I have to fill in a list in 2 passes. Pass 1 fills in certain known indices with values. Pass 2 fills in the rest, skipping over items filled in the previous pass. I don't think I can fill in values at specific indices unless I initialize the list. – Raj Aug 17 '15 at 04:32

9 Answers9

398

The first thing that comes to mind for me is:

verts = [None]*1000

But do you really need to preinitialize it?

Steve Losh
  • 19,642
  • 2
  • 51
  • 44
  • that's not a 'need', just a (premature?) optimisation – Javier Feb 06 '09 at 19:09
  • 2
    It's not a premature optimization if you are writing new code. – Joan Venge Feb 06 '09 at 19:11
  • 1
    That's like saying, don't care about performance, just write it. – Joan Venge Feb 06 '09 at 19:11
  • 22
    Yes, that's exactly the point. "Premature optimization is the root of all evil" just means that you should write code without caring about performance - at first. If you find that the code is running slow later on, *then* go back and make optimizations like this one. – David Z Feb 06 '09 at 19:15
  • 5
    I think you are wrong. Premature optimization happens when you try to change something that already works. You should always write the fastest code possible. – Joan Venge Feb 06 '09 at 20:59
  • 32
    No, premature optimization is when you try to optimize code that you aren't certain needs to be optimized. You should NOT always write the fastest code possible -- other concerns like business goals, maintainance cost, engineering time to write it, are often more important. – user26294 Feb 06 '09 at 21:07
  • 4
    Even your phrase validates mine. When you try to optimize code. Well in my case, since there is NO code, obviously I can't do any premature optimization. – Joan Venge Feb 06 '09 at 21:19
  • 5
    To quote a friend of mine, Peter Ritchie: "That's the biggest misinterpretation of a misinterpreted quote. It was Hoare, and the quote is: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."The intention is – Joan Venge Feb 06 '09 at 21:20
  • 7
    that small changes for small improvements in performance don't justify the introduced instability. But, when writing original code, it should always be written with the most performant algorithm. – Joan Venge Feb 06 '09 at 21:21
  • 1
    "We should forget about small efficiencies" - Preinitializing a list of 1000 is premature. – Richard Levasseur Feb 07 '09 at 18:45
  • 2
    1000 is just an example. – Joan Venge Feb 09 '09 at 17:23
  • 104
    Note that there are other legitimate cases than optimizations for the wish to pre-allocate an array. It could be the code that uses it doesn't add elements, just replaces existing ones, so it's more like an array than a list. – Lasse V. Karlsen Sep 10 '09 at 08:08
  • 1
    Maybe it's not about optimization. For example in Dynamic Programming a pre-initialized list is useful. (Array would be better though.) – Georg Schölly Sep 10 '09 at 08:18
  • 61
    This way of initializing a Python array is evil: `a=[[]]*2; a[0].append('foo');` now inspect `a[1]`, and you will be shocked. In contrast `a=[[] for k in range(2)]` works fine. – Joachim W Aug 12 '13 at 21:40
  • 2
    what if you have a graph algorithm and a small amount of memory and the only important thing is that the position is visited or not. This is very useful for a lot of modern stuff. – Andrew Scott Evans Aug 29 '14 at 19:02
  • 17
    Check your assumptions. E.g. I'm currently analyzing a network error rate by parsing a logfile and putting errors in an array of bins, currently 4 bins / hour and 24 hours / day. Hours in a day doesn't change, and if I change bins/hour I'll stop and restart the program, so I always want (currently) 4 * 24 = 96 bins. It seems natural to me (with a C / C++ / C# / etc. background) to start by initializing each bin to 0. How is this an optimization, whether premature or not? – Technophile Dec 28 '14 at 20:19
  • 2
    Correctly using memory is not evil premature optimization. True, beginners shouldn't get hung up on details early on, but I'm an experienced programmer and if I know I want a list of a certain size, and it takes no extra effort to make it the right size, I'm going to do it. Making sure I'm not being dumb with resources prevents me from running into weird bugs later as the software I write expands in terms of user base and features. The real problem is premature optimization that makes your code less readable and maintainable. – Slothario Mar 15 '19 at 21:29
  • I fail to see how, if you know how long a list you want, defining a list of a set size and iterating over it with `for i in list: list[i] = "blah"` will change business goals or increase maintenance costs over `list.append("blah")`, despite the latter being objectively slower. If you want optimised code, then write optimised code. – GMSL Oct 07 '20 at 09:52
  • @JoachimW your comment was quite interesting and indeed I was shocked with the result. Care to explain how did that happen? – Anirudh Goel Sep 07 '21 at 07:04
  • @Anirudh_Goel: It seems, * is copying pointers to arrays, not arrays. Whether this is an accident or a reasonable choice could be a good topic for the Python developers mailing list. – Joachim W Sep 08 '21 at 08:17
85

Not quite sure why everyone is giving you a hard time for wanting to do this - there are several scenarios where you'd want a fixed size initialised list. And you've correctly deduced that arrays are sensible in these cases.

import array
verts=array.array('i',(0,)*1000)

For the non-pythonistas, the (0,)*1000 term is creating a tuple containing 1000 zeros. The comma forces python to recognise (0) as a tuple, otherwise it would be evaluated as 0.

I've used a tuple instead of a list because they are generally have lower overhead.

FoxyLad
  • 1,616
  • 11
  • 12
  • 7
    Some people take "premature" optimization literally I guess. – Joan Venge Dec 07 '09 at 18:28
  • 7
    Thanks! This solution is *exactly* what I was looking for. When profiling, the list initialization was the bottleneck in my code, and this made it 2 times faster. – Frederik Nov 28 '11 at 08:56
  • 81
    Sadly I've yet to find an answer to a Python question on SO which doesn't contain some smug "why would you want to do that?"-type dorm-room arrogance as a standard response. Yay "community". – tomato Apr 03 '13 at 16:15
  • 2
    @mikerodent Joan is a male name in a number of countries around the world, including France, Spain and the Netherlands. – Chris Jul 14 '15 at 19:31
  • @Chris certainly true for Spain, and for all I know N'lands. Not sure about France, having lived there for many years. If I may modify my remark slightly, this particularly annoying tone of aggression may be due to some (anglophone) "dorm-room jocks" **assuming** Joan to be female. – mike rodent Jul 14 '15 at 19:46
  • This answer looks good, but is terribly slow today. In Python2.7, a = [0]*10000000 is several times faster than array.array('i',(0,)*10000000) – Max Jun 18 '18 at 06:37
  • @MAx I thought that array is fast. Anyway, the `i` as first arg to array is for integer. – Timo Dec 31 '20 at 18:44
68

One obvious and probably not efficient way is

verts = [0 for x in range(1000)]

Note that this can be extended to 2-dimension easily. For example, to get a 10x100 "array" you can do

verts = [[0 for x in range(100)] for y in range(10)]
e-shy
  • 681
  • 5
  • 2
38

Wanting to initalize an array of fixed size is a perfectly acceptable thing to do in any programming language; it isn't like the programmer wants to put a break statement in a while(true) loop. Believe me, especially if the elements are just going to be overwritten and not merely added/subtracted, like is the case of many dynamic programming algorithms, you don't want to mess around with append statements and checking if the element hasn't been initialized yet on the fly (that's a lot of code gents).

object = [0 for x in range(1000)]

This will work for what the programmer is trying to achieve.

AndrewC
  • 32,300
  • 7
  • 79
  • 115
Miko
  • 484
  • 4
  • 4
  • 1
    +1. I was afraid if I am doing right thing initialising array with pre-defined size. Your answer makes me calm. – smajli Feb 28 '19 at 09:14
29

@Steve already gave a good answer to your question:

verts = [None] * 1000

Warning: As @Joachim Wuttke pointed out, the list must be initialized with an immutable element. [[]] * 1000 does not work as expected because you will get a list of 1000 identical lists (similar to a list of 1000 points to the same list in C). Immutable objects like int, str or tuple will do fine.

Alternatives

Resizing lists is slow. The following results are not very surprising:

>>> N = 10**6

>>> %timeit a = [None] * N
100 loops, best of 3: 7.41 ms per loop

>>> %timeit a = [None for x in xrange(N)]
10 loops, best of 3: 30 ms per loop

>>> %timeit a = [None for x in range(N)]
10 loops, best of 3: 67.7 ms per loop

>>> a = []
>>> %timeit for x in xrange(N): a.append(None)
10 loops, best of 3: 85.6 ms per loop

But resizing is not very slow if you don't have very large lists. Instead of initializing the list with a single element (e.g. None) and a fixed length to avoid list resizing, you should consider using list comprehensions and directly fill the list with correct values. For example:

>>> %timeit a = [x**2 for x in xrange(N)]
10 loops, best of 3: 109 ms per loop

>>> def fill_list1():
    """Not too bad, but complicated code"""
    a = [None] * N
    for x in xrange(N):
        a[x] = x**2
>>> %timeit fill_list1()
10 loops, best of 3: 126 ms per loop

>>> def fill_list2():
    """This is slow, use only for small lists"""
    a = []
    for x in xrange(N):
        a.append(x**2)
>>> %timeit fill_list2()
10 loops, best of 3: 177 ms per loop

Comparison to numpy

For huge data set numpy or other optimized libraries are much faster:

from numpy import ndarray, zeros
%timeit empty((N,))
1000000 loops, best of 3: 788 ns per loop

%timeit zeros((N,))
100 loops, best of 3: 3.56 ms per loop
lumbric
  • 7,644
  • 7
  • 42
  • 53
4

You could do this:

verts = list(xrange(1000))

That would give you a list of 1000 elements in size and which happens to be initialised with values from 0-999. As list does a __len__ first to size the new list it should be fairly efficient.

John Montgomery
  • 8,868
  • 4
  • 33
  • 43
  • 5
    before python 3.0 it would be range(1000); in python 3.0 it would be list(range(1000)) –  Feb 08 '09 at 18:07
0

You should consider using a dict type instead of pre-initialized list. The cost of a dictionary look-up is small and comparable to the cost of accessing arbitrary list element.

And when using a mapping you can write:

aDict = {}
aDict[100] = fetchElement()
putElement(fetchElement(), fetchPosition(), aDict)

And the putElement function can store item at any given position. And if you need to check if your collection contains element at given index it is more Pythonic to write:

if anIndex in aDict:
    print "cool!"

Than:

if not myList[anIndex] is None:
    print "cool!"

Since the latter assumes that no real element in your collection can be None. And if that happens - your code misbehaves.

And if you desperately need performance and that's why you try to pre-initialize your variables, and write the fastest code possible - change your language. The fastest code can't be written in Python. You should try C instead and implement wrappers to call your pre-initialized and pre-compiled code from Python.

Abgan
  • 3,696
  • 22
  • 31
-2

Without knowing more about the problem domain, it's hard to answer your question. Unless you are certain that you need to do something more, the pythonic way to initialize a list is:

verts = []

Are you actually seeing a performance problem? If so, what is the performance bottleneck? Don't try to solve a problem that you don't have. It's likely that performance cost to dynamically fill an array to 1000 elements is completely irrelevant to the program that you're really trying to write.

The array class is useful if the things in your list are always going to be a specific primitive fixed-length type (e.g. char, int, float). But, it doesn't require pre-initialization either.

user26294
  • 5,532
  • 3
  • 22
  • 18
  • 7
    You don't see the point. I just want to create an list/array with a predefined number of elements. Commenting on why and how I should need is silly. I know what I am doing. Thanks. – Joan Venge Feb 06 '09 at 21:18
  • 3
    When I said, I know what I am doing, I meant programming wise, not python. If I knew python, I wouldn't ask the question, now would I? – Joan Venge Feb 09 '09 at 17:25
  • 2
    Can you edit the question and explain a bit more of the context? From the question, it's not clear what the right answer is, and it's also not clear that you know what you're doing. – user26294 Feb 10 '09 at 19:56
-4

This:

 lst = [8 for i in range(9)]

creates a list, elements are initialized 8

but this:

lst = [0] * 7

would create 7 lists which have one element

Moe Far
  • 2,742
  • 2
  • 23
  • 41
Jutta
  • 513
  • 5
  • 13
  • 10
    `[0] * 7` evaluates to `[0, 0, 0, 0, 0, 0, 0]`, which is a list containing 7 elements. Or are you describing behavior of some very old version of Python? – FooF Jul 20 '16 at 03:17
  • What he said is the list contains 7 elements, but all of the 7 elements point to a same memory. And a modification to any of those 7 elements will lead to others change correspondingly. – York Mar 27 '17 at 12:19
  • 2
    Um, not if the elements are integers, right? I just tried `mylist = [0] * 4`, then after `mylist[0] = 12`, `mylist` returns `[12, 0, 0, 0]` – toonarmycaptain Jan 12 '18 at 14:37