2

If I have a list 10,000 entries in length, is it faster to:

  1. Make a blank list then append to it.

Or

  1. Make a list filled with 10,000 blank entries and set each entry to the data.

Example Code

# first case 
a=[]
for i in range(10000):
    a.append(input())
# second case
a= [0]*10000
for i in range(10000):
    a[i] = input()
minssi
  • 333
  • 1
  • 2
  • 10
  • use list comprehension, faster then both: `a = [input() for _ in range(10000)]` – Tadhg McDonald-Jensen Apr 11 '16 at 01:17
  • 1
    Write the simplest, most comprehensible solution first, then profile, then optimize. – Jonathon Reinhart Apr 11 '16 at 01:33
  • Actually i am solving some problem in online judge web site,my code's input method is run by site's some program, i make problem's core algorithm, but i fail by time over, so i make my code faster in every direction – minssi Apr 11 '16 at 01:35
  • @김민성: Usually, if you outright fail the time limit on something like that, you need algorithmic improvements. Microoptimizations like this won't be enough, even if you do a lot of them. – user2357112 Apr 11 '16 at 01:42
  • @user2357112 , i see another people's C++ code(success), algorithm is same, so i think that i write my code more pythonic(?) – minssi Apr 11 '16 at 01:58
  • @김민성 if you fail on time have you done checks that your code is actually doing the right thing? It might be possible you have an infinite loop, although do know that python generally runs slower then other compiled code. – Tadhg McDonald-Jensen Apr 11 '16 at 02:43

3 Answers3

7

the timeit module is great for testing this sort of thing:

# first case
def test1():
    a=[]
    for i in range(10000):
        a.append(1)
# second case
def test2():
    a= [0]*10000
    for i in range(10000):
        a[i] = 1

#list comprehension
def test3():
    a = [1 for _ in range(10000)]

import timeit
n = 10000
print("appending:    ",timeit.timeit(test1,number=n))
print("assigning:    ",timeit.timeit(test2,number=n))
print("comprehension:",timeit.timeit(test3,number=n))

output:

appending:     13.14265166100813
assigning:     8.314113713015104
comprehension: 6.283505174011225

as requested, I replaced timeit.timeit(...) with sum(timeit.repeat(..., repeat=7))/7 to get an average time and got this result:

appending:     12.813485399578765
assigning:     8.514678678861985
comprehension: 6.271697525575291

which is not drastically different from my original results.

Tadhg McDonald-Jensen
  • 20,699
  • 5
  • 35
  • 59
  • well that isn't a valid statement, do you mean `timeit.Timer(...` by any chance? or just `timeit.repeat(test1,number=n,repeat=7)`? **running now will post results in a moment.** – Tadhg McDonald-Jensen Apr 11 '16 at 01:28
  • Whoops. Haha yes.. I'm using `timeit.Timer` with a setup. – Jason Apr 11 '16 at 01:29
  • @Signal I added new results but it doesn't differ much from my original ones, I'm using a Mac on Cpython3.5 if you are interested. – Tadhg McDonald-Jensen Apr 11 '16 at 01:38
  • Mind postings all the results instead of the average? I think there are some outliers because I am getting a significantly smaller result. – Jason Apr 11 '16 at 03:26
4

I thought I'd try to answer using CS principles, without timing it empirically, because simply timing it really doesn't give you the gist of why one is better, or whether it is true for other values of N.

Python lists are implemented as arrays. This means that "appending" requires periodic resizing, whereas the blank/reassign option is a single allocation followed by 10,000 O(1) access times. Therefore, in the limit (e.g., for 10K, 100K, 1M, etc), I would expect the second option to be much faster due to all the resizing required by the first option.

For further reading see: How is Python's List Implemented?

mx0
  • 6,445
  • 12
  • 49
  • 54
Tommy
  • 12,588
  • 14
  • 59
  • 110
0

They are both comparable in speed. The list comprehension suggested by @Tadhg is about twice as fast.

These are the timing results:

first case:        10.3030366897583
second case:        9.829667568206787
list comprehension: 5.473726511001587

And this is the source code I used:

from time import time
from random import random

# first case 
iterations = 10000000
start = time()
a=[]
for i in range(iterations):
    a.append(random())
print(time() - start)
# second case
start = time()
a= [0]*iterations
for i in range(iterations):
    a[i] = random()
print(time() - start)
start = time()
a = [random() for _ in range(iterations)]
print(time() - start)
Felix Dombek
  • 13,664
  • 17
  • 79
  • 131