3

I'm loading large h5 files into memory using numpy ndarray's. I read that my system (Win 7 prof., 6 GB RAM) is supposed to allow python.exe to use about 2 GB of physical memory.

However I'm getting a MemoryError already just shy of 1 GB. Even stranger this lower limit seems to only apply for numpy array's but not for a list.

I've tested my memory consumption using the following function found here:

import psutil
import gc
import os
import numpy as np
from matplotlib.pyplot import pause

def memory_usage_psutil():
    # return the memory usage in MB
    process = psutil.Process(os.getpid())
    mem = process.get_memory_info()[0]/float(2**20)
    return mem

Test 1: Testing memory limits for an ordinary list

print 'Memory - %d MB' %memory_usage_psutil() # prints memory usage after imports
a = []
while 1:
    try:
        a.append([x*2000 for x in xrange(10000)])
    except MemoryError:
        print 'Memory - %d MB' %memory_usage_psutil()
        a = []
        print 'Memory - %d MB' %memory_usage_psutil()
        print 'run garbage collector: collected %d objects.' %gc.collect()
        print 'Memory - %d MB\n\n' %memory_usage_psutil()
        break

Test 1 prints:

Memory - 39 MB
Memory - 1947 MB
Memory - 1516 MB
run garbage collector: collected 0 objects.
Memory - 49 MB

Test 2: Creating a number of large np.array's

shape = (5500,5500)
names = ['b', 'c', 'd', 'g', 'h']

try:
    for n in names:
        globals()[n] = np.ones(shape, dtype='float64')
        print 'created variable %s with %0.2f MB'\
        %(n,(globals()[n].nbytes/2.**20))
except MemoryError:
    print 'MemoryError, Memory - %d MB. Deleting files..'\
    %memory_usage_psutil()
    pause(2)
    # Just added the pause here to be able to observe
    # the spike of memory in the Windows task manager.
    for n in names:
        globals()[n] = []
    print 'Memory - %d MB' %memory_usage_psutil()
    print 'run garbage collector: collected %d objects.' %gc.collect()
    print 'Memory - %d MB' %memory_usage_psutil()

Test 2 prints:

Memory - 39 MB
created variable b with 230.79 MB
created variable c with 230.79 MB
created variable d with 230.79 MB
created variable g with 230.79 MB
MemoryError, Memory - 964 MB. Deleting files..
Memory - 39 MB
run garbage collector: collected 0 objects.
Memory - 39 MB

My question: Why do I get a MemoryError before I'm even close to the 2GB limit and why is there a difference in memory limits for a list and np.array respectively or what am I missing? I'm using python 2.7 and numpy 1.7.1

Betrieb
  • 70
  • 7
  • I guess I could work around this by appending all arrays into a list, however I would have to change my code accessing the arrays and it would certainly not be a nice solution. – Betrieb Oct 03 '13 at 10:22
  • It's time for you to go 64-bit. Use http://www.lfd.uci.edu/~gohlke/pythonlibs or 64-bit cygwin. – user57368 Oct 04 '13 at 01:25
  • Yip, eventually I gave up and am now on 64 bit. Should have done that in the first place; it would have saves me so much hassle. – Betrieb Oct 09 '13 at 12:17

1 Answers1

2

This is probably happening because numpy array is using some C array library (for speed), that is somewhere calling a malloc. This then fails because it cannot allocate a contiguous 1GB of memory. I am further guessing that Python lists are implemented as a linked list, thus the memory needed for a list need not be contiguous. Hence, if you have enough memory available but it is fragmented, your array malloc would fail but your linked list would allow you to use all of the noncontiguous pieces.

Tommy
  • 12,588
  • 14
  • 59
  • 110
  • 2
    Python list is an array (not linked list), the problem, I guess, is because the multidimensional list in python aren't contiguous because the parent list contains just pointers to the child list, while numpy multidimensional array are actually just one big array so they have to be contiguous. – Lie Ryan Oct 04 '13 at 03:18
  • Thanks a lot for the clear explanations. I changed my test 2 to load a larger number of smaller arrays (~60 MB/array) and I only get a MemoryError at around 1.5 GB. This behavior indicates that more numpy arrays are able to find contiguous space in the memory. When I further reduce the array size to about 10 MB then I'm able to utilize 1817 MB of memory. – Betrieb Oct 04 '13 at 09:39