29

Problem

Let's assume that I want to find n**2 for all numbers smaller than 20000000.

General setup for all three variants that I test:

import time, psutil, gc

gc.collect()
mem_before = psutil.virtual_memory()[3]
time1 = time.time()

# (comprehension, generator, function)-code comes here

time2 = time.time()
mem_after =  psutil.virtual_memory()[3]

print "Used Mem = ", (mem_after - mem_before)/(1024**2)  # convert Byte to Megabyte
print "Calculation time = ", time2 - time1

Three options to calculate these numbers:

1. Creating a list of via comprehension:

x = [i**2 for i in range(20000000)]

It is really slow and time consuming:

Used Mem =  1270  # Megabytes
Calculation time =  33.9309999943  # Seconds

2. Creating a generator using '()':

x = (i**2 for i in range(20000000))

It is much faster than option 1, but still uses a lot of memory:

Used Mem =  611 
Calculation time =  0.278000116348 

3. Defining a generator function (most efficient):

def f(n):
    i = 0
    while i < n:
        yield i**2
        i += 1
x = f(20000000)

Its consumption:

Used Mem =  0
Calculation time =  0.0

The questions are:

  1. What's the difference between the first and second solutions? Using () creates a generator, so why does it need a lot of memory?
  2. Is there any built-in function equivalent to my third option?
Azat Ibrakov
  • 9,998
  • 9
  • 38
  • 50
EbraHim
  • 2,279
  • 2
  • 16
  • 28
  • 10
    use `xrange` in 2.7 for less memory usage. – YOU May 11 '16 at 08:11
  • 6
    That is it - `range` creates a list anyways in Python 2.x. See [What is the difference between range and xrange functions in Python 2.X?](http://stackoverflow.com/questions/94935/what-is-the-difference-between-range-and-xrange-functions-in-python-2-x) – miradulo May 11 '16 at 08:12
  • 14
    `() need a lot of memory`. This statement is so wrong. – AKS May 11 '16 at 08:16
  • @AKS two bytes used to be considered a lot :P – Jules May 11 '16 at 13:32
  • 1
    @Jules yes indeed! it `used to be` :) – AKS May 11 '16 at 13:37
  • This seems pretty basic, hence down-voted. At least should have done some memory profiling or read the docs. – denfromufa May 16 '16 at 02:44
  • Your third test, `x = f(20000000)`, does nothing, it doesn't iterate. Use something like `list(f(...))` or `max(f(...))` to force all the values to actually be generated. – J_H Apr 01 '20 at 16:37

2 Answers2

55
  1. As others have pointed out in the comments, range creates a list in Python 2. Hence, it is not the generator per se that uses up the memory, but the range that the generator uses:

    x = (i**2 for i in range(20000000))  
    # builds a 2*10**7 element list, not for the squares , but for the bases
    
    >>> sys.getsizeof(range(100))
    872
    >>> sys.getsizeof(xrange(100))
    40
    >>> sys.getsizeof(range(1000))
    8720
    >>> sys.getsizeof(xrange(1000))
    40
    >>> sys.getsizeof(range(20000000))
    160000072
    >>> sys.getsizeof(xrange(20000000))
    40
    

    This also explains why your second version (the generator expression) uses around half the memory of the first version (the list comprehension) as the first one builds two lists (for the bases and the squares) while the second only builds one list for the bases.

  2. xrange(20000000) thus, greatly improves memory usage as it returns a lazy iterable. This is essentially the built-in memory efficient way to iterate over a range of numbers that mirrors your third version (with the added flexibility of start, stop and step):

    x = (i**2 for i in xrange(20000000))
    

    In Python 3, range is essentially what xrange used to be in Python 2. However, the Python 3 range object has some nice features that Python 2's xrange doesn't have, like O(1) slicing, contains, etc.

Some references:

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
user2390182
  • 72,016
  • 6
  • 67
  • 89
  • Saying it 'should work much more memory efficiently' is a bit weak of an answer IMHO. Maybe explain how `range` creates a list in memory, and how that affects OP's answer 2? – miradulo May 11 '16 at 08:18
  • @DonkeyKong: cultural thing. If it doesn't use a lot of memory, it's good, but there's no reason to *compliment* a solution by bringing absolute numbers. Or, *ned gebruddeld isch gnug g'lobt*, like schwobaseggl would probably say. – Marcus Müller May 11 '16 at 08:34
  • @MarcusMüller I didn't intend to get the answer to bring absolute numbers, I just felt like the explanation was lacking a little. Though I don't mind it, nice answer :) +1 – miradulo May 11 '16 at 08:38
1

1.- The object must be created in memory, so in your second solution, the generator is created but not computed, but still has memory, python probably reserve some memory for its computation to be efficient, we don't know about the interpreter magic, also notice that range funtion creates the full list from 0 to 200000, so in fact you are still building that list in memory.

2.- You can use itertool.imap:

squares = itertools.imap(lambda x: x**2, xrange(200000))
Netwave
  • 40,134
  • 6
  • 50
  • 93