6

This code produces a MemoryError :

from pylab import complex128
import numpy

x = numpy.empty(100000000, dtype=complex128)    # 100 millions complex128

I have Win7 64 with 8 GB RAM (at least 5.3 GB free when running this code). I'm using Python 2.7 (Anaconda) and I think it is the 32 bits version. Even with 32bits, we should be able to handle 1.6 GB !

Do you know how to solve this ?

PS : I expected an array of 100 millions items, each of one using 16 bytes (128 bits) to use 16 * 100 millions = 1.6 GB. This is confirmed by :

x = numpy.empty(1000000, dtype=complex128)    # 1 million here
print x.nbytes
>>> 16000000        # 16 MB
Basj
  • 41,386
  • 99
  • 383
  • 673
  • 3
    How do you know the list is only 1.5GB small? Did you measure process memory use before trying to allocate the extra 1.5GB? How much memory was already in use? 32bit means you have a theoretical limit of 2GB per process. – Martijn Pieters Nov 22 '13 at 17:27
  • 1
    128bits = 16 bytes per item. 100 millions times 16 is approx. 1.6 GB. – Basj Nov 22 '13 at 17:27
  • 2
    That presumes there is no overhead. – Martijn Pieters Nov 22 '13 at 17:28
  • What do you mean by no overhead ? – Basj Nov 22 '13 at 17:29
  • The `empty()` type has to keep track of information too. – Martijn Pieters Nov 22 '13 at 17:29
  • Is the -2 because I'm using Win or because the question in not interesting ? If so, why is it not interesting ? – Basj Nov 22 '13 at 17:32
  • 1
    `empty()` is a Python object implemented in C; it'll have a struct where bookkeeping information is stored, and a C datastructure to *hold* the 100 million values. – Martijn Pieters Nov 22 '13 at 17:33
  • 1
    if 1kb of ram is used in every single GB of ram you have, you will not be able to allocate a good chunk of memory (e.g. 2gbs) – JoeC Nov 22 '13 at 17:33
  • I did not vote on your question, I don't know why people voted on this the way they did. You show no evidence in your question however as to how you got to the 1.6 GB figure or any attempts to find what *can* fit in memory. – Martijn Pieters Nov 22 '13 at 17:34
  • I added a precision in my question about how I found 1.6 GB. By the way, is there an easy way for finding the largest complex128 that can fit in memory ? – Basj Nov 22 '13 at 17:38
  • 7
    @MartijnPieters Well, that leaves about 500 MiB for "overhead". Python itself only takes 1.6 MiB by itself, and the imported modules shouldn't amount to more than another few MiB. The administrative overhead of the NumPy array should be negligible. I can't argue with the `MemoryError`, but "overhead" seems an unlikely cause to me. This smells like virtual address space fragmentation. –  Nov 22 '13 at 17:46
  • 1
    This looks relevant: http://stackoverflow.com/questions/18282867/python-32-bit-memory-limits-on-64bit-windows – Warren Weckesser Nov 22 '13 at 17:47
  • 1
    `.nbytes` is the total size of the managed array, but not the total allocated memory size for the object. I am not familiar enough with numpy to make any definite statements about memory footprints though. – Martijn Pieters Nov 22 '13 at 17:49
  • 7
    You are asking for 1.6 GB of *contiguous* memory in a 2 GB virtual address space, it's enough to have a single memory page allocated in any point of the range 400 MB - 1600 MB to give you this problem. You could check the memory layout of your Python process, IIRC even Process Explorer could do that. – Matteo Italia Nov 22 '13 at 17:52
  • Thanks for these comments that help me to understand! What is the solution then? For Windows users, which version would you use that would minimize disappointments on my previous codes ? Python Anaconda 64 bits? – Basj Nov 22 '13 at 17:56
  • 5
    Why not use the 64bit version of Python? – Tim Pietzcker Nov 22 '13 at 18:05
  • 3
    If it truly is memory fragmentation you might get around it by allocating multiple smaller arrays. – Mark Ransom Nov 22 '13 at 19:06
  • If I install 64bits, will all the preinstalled librarires for Python Anaconda 32bits automatically work for the newly installed 64bits ? – Basj Nov 22 '13 at 20:44
  • 2
    @Basj probably if you install the Python 64 bit all the packages that are compiled will have to be re-installed. But it is worth it, I recently moved to 64 bit and got a significant performance gain and less memory issues... – Saullo G. P. Castro Nov 22 '13 at 22:08

2 Answers2

6

The problem was solved with Python 64bit.

It's even possible to create a single array of more than 5 GB.

Note : when I create an array which should use 1 600 000 000 bytes (with 100 million items in a complex128 array), the actual memory usage is not "much" more : 1 607 068 KB...

Basj
  • 41,386
  • 99
  • 383
  • 673
3

I know it's old question. And I know there are many similar questions. ex. Memory for python.exe on Windows 7 python 32 - Numpy uses half of the available memory only? But none of them seems to really solve the issue.

Using the hint given here https://stackoverflow.com/a/18282931/566035, I think I finally fixed this issue.

First, you need to install "Microsoft Visual C++ Express Edition 2008". You can follow the instructions given here: http://blog.victorjabur.com/2011/06/05/compiling-python-2-7-modules-on-windows-32-and-64-using-msvc-2008-express/

The download URL for Microsoft Visual C++ Express Edition 2008 in the above blog article is dead. But, you can find the URL here Visual C++ 2008 Express Download Link Dead?.

(EDIT) I confirmed that the linker that comes with msvc-2010-express also works.

Then, launch Visual Studio 2008 Command Prompt from start menu -> Microsoft Visual C++ 2008 Express Edition -> Visual Studio Tools - > Visual Studio 2008 Command Prompt

Then do these commands:

cd bin
editbin.exe /LARGEADDRESSAWARE "C:\Python27\python.exe"

This will set IMAGE_FILE_LARGE_ADDRESS_AWARE flag in the python executable. With this magic, 32 bit python will use up to 4 GB (instead of ~2 GB on windows).

According to MSDN:

On 64-bit editions of Windows, 32-bit applications marked with the IMAGE_FILE_LARGE_ADDRESS_AWARE flag have 4 GB of address space available.

Now,

x = numpy.empty(100000000, dtype=complex128)

actually works on my Windows 7 64-bit PC with 32 bit Python 2.7.

I really hope the official python binary to be shipped with this FLAG already set, as there is no harm in doing so but huge benefit!

As MSDN says:

Setting this flag and then running the application on a system that does not have 4GT support should not affect the application.

Community
  • 1
  • 1
otterb
  • 2,660
  • 2
  • 29
  • 48