2

What costs more in python, making a list with comprehension or with a standalone function?

It appears I failed to find previous posts asking the same question. While other answers go into detail about bytes and internal workings of python, and that Is indeed helpful, I felt like the visual graphs help to show that there is a continuous trend.

I don't yet have a good enough understanding of the low level workings of python, so those answers are a bit foreign for me to try and comprehend.

HebeleHododo
  • 3,620
  • 1
  • 29
  • 38
tgabb
  • 349
  • 3
  • 12
  • Since you are into performance, and are presumably into large data structures, have a look at generator functions as well. Basically the same syntax as comprehensions, but [depending on the usecase](http://stackoverflow.com/questions/245792/when-is-not-a-good-time-to-use-python-generators) it can be [much faster and/or less expensive in terms of memory](http://stackoverflow.com/questions/102535/what-can-you-use-python-generator-functions-for). – Michel Müller Mar 23 '17 at 05:23
  • 1
    Possible duplicate of [Why is list comprehension so faster?](http://stackoverflow.com/questions/30245397/why-is-list-comprehension-so-faster) – xyres Mar 23 '17 at 05:48
  • Here's an interesting [blog post](http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/) that discusses the efficiency of list comprehensions. – xyres Mar 23 '17 at 05:50
  • Thank you for sharing this blog – tgabb Mar 23 '17 at 06:02

1 Answers1

2

I am currently an undergrad in CS, and I am continually amazed with how powerful python is. I recently did a small experiment to test the cost of forming lists with comprehension versus a standalone function. For example:

def make_list_of_size(n):
    retList = []
    for i in range(n):
        retList.append(0)
    return retList

creates a list of size n containing zeros.

It is well known that this function is O(n). I wanted to explore the growth of the following:

def comprehension(n):
    return [0 for i in range(n)]

Which makes the same list.

let us explore!

This is the code I used for timing, and note the order of function calls (which way did I make the list first). I made the list with a standalone function first, and then with comprehension. I have yet to learn how to turn off garbage collection for this experiment, so, there is some inherent measurement error, brought about when garbage collection kicks in.

'''
file: listComp.py
purpose: to test the cost of making a list with comprehension
versus a standalone function
'''
import time as T
def get_overhead(n):
    tic = T.time()
    for i in range(n):
        pass
    toc = T.time()
    return toc - tic


def make_list_of_size(n):
    aList = [] #<-- O(1)
    for i in range(n): #<-- O(n)
        aList.append(n) #<-- O(1)
    return aList #<-- O(1)

def comprehension(n):
    return [n for i in range(n)] #<-- O(?)

def do_test(size_i,size_f,niter,file):
    delta = 100
    size = size_i
    while size <= size_f:
        overhead = get_overhead(niter)

        reg_tic = T.time()
        for i in range(niter):
            reg_list = make_list_of_size(size)
        reg_toc = T.time()

        comp_tic = T.time()
        for i in range(niter):
            comp_list = comprehension(size)
        comp_toc = T.time()

        #--------------------

        reg_cost_per_iter = (reg_toc - reg_tic - overhead)/niter
        comp_cost_pet_iter = (comp_toc - comp_tic - overhead)/niter

        file.write(str(size)+","+str(reg_cost_per_iter)+
            ","+str(comp_cost_pet_iter)+"\n")

        print("SIZE: "+str(size)+ " REG_COST = "+str(reg_cost_per_iter)+
            " COMP_COST = "+str(comp_cost_pet_iter))

        if size == 10*delta:
            delta *= 10
        size += delta

def main():
    fname = input()
    file = open(fname,'w')
    do_test(100,1000000,2500,file)
    file.close()

main()

I did three tests. Two of them were up to list size 100000, the third was up to 1*10^6

See Plots:

Cost of Insertion. Ignore left plot Cost of Insertion. Ignore left plot

Overlay with NO ZOOM

I found these results to be intriguing. Although both methods have a big-O notation of O(n), the cost, with respect to time, is less for comprehension for making the same list.

I have more information to share, including the same test done with the list made with comprehension first, and then with the standalone function.

I have yet to run a test without garbage collection.

tgabb
  • 349
  • 3
  • 12