3

There are numerous posts about numpy memory errors in google land, but I can't find one that resolves my issue. I'm running someone else's software using a high-end server with 256GB of RAM, 64-bit opensuse 13.1, 64-bit python, and 64-bit numpy (as far as I can tell). See below.

The original author is not available for help requests, so I did my best to determine the memory size for the object numpy is attempting to create. First, here is the stack trace:

File "/home/<me>/cmsRelease/trunk/Classes/DotData.py", line 193, in __new__
  DataObj = numpy.rec.fromarrays(Columns,names = names)
File "/usr/lib64/python2.7/site-packages/numpy/core/records.py", line 562, in fromarrays
  _array = recarray(shape, descr)
File "/usr/lib64/python2.7/site-packages/numpy/core/records.py", line 400, in __new__
  self = ndarray.__new__(subtype, shape, (record, descr), order=order)
MemoryError

I used the following for loop to estimate the object size as best I know how:

size = 0
for i in Columns:  # Columns is the list passed into numpy.rec.fromarrays
    size += sys.getsizeof(i)
print "Columns size: " + str(size)

The result is Columns size: 12051648. Unless I'm mistaken, that's only 12MB, but in either case, it's a far cry from 256GB.

Based on this information, I suspect there is a system limit (ulimit) preventing python from accessing the memory. Running ulimit -a reports the following (I set ulimit -s 256000000 before I run the program):

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 2065541
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 256000000
cpu time               (seconds, -t) unlimited
max user processes              (-u) 2065541
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Questions:

  1. What am I missing?
  2. Did I not measure the Columns list object size correctly?
  3. Is there another system property I need to set?

I wish the Memory Error would be more specific. I appreciate your help.

Supporting system information:

System memory:

> free -h
             total       used       free     shared    buffers     cached
Mem:          252G       1.6G       250G       4.2M        12M        98M
-/+ buffers/cache:       1.5G       250G
Swap:         2.0G        98M       1.9G

OS version:

> cat /etc/os-release
NAME=openSUSE
VERSION="13.1 (Bottle)"
VERSION_ID="13.1"
PRETTY_NAME="openSUSE 13.1 (Bottle) (x86_64)"
ID=opensuse
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:opensuse:13.1"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://opensuse.org/"
ID_LIKE="suse"

Python version:

Python 2.7.6 (default, Nov 21 2013, 15:55:38) [GCC] on linux2
>>> import platform; platform.architecture()
('64bit', 'ELF')

Numpy version:

>>> numpy.version
<module 'numpy.version' from '/usr/lib64/python2.7/site-packages/numpy/version.pyc'>
>>> numpy.version.version
'1.7.1'
Mark Ebbert
  • 441
  • 3
  • 13
  • 2
    What is the value of the `shape` argument passed to the `__new__` call? `sys.getsizeof` only returns the size of the container, not the size of its contents, so it's hard to say where the overload is without know more about the nature of the data. – BrenBarn Aug 02 '14 at 19:32
  • Assuming that `Columns` is a list of numpy `ndarray`s, use `Columns[n].nbytes` to get the size of the nth column in bytes. – ali_m Aug 02 '14 at 21:10
  • I apologize, but I don't know how to get the value of `shape` since it's part of `numpy`. Can you clarify? Debugging this software is not realistic. It uses four major languages (python, perl, java, and C). Yeah, really. – Mark Ebbert Aug 02 '14 at 21:11
  • @zugzug `numpy.ndarray` and `numpy.matrix` objects have a `.shape` attribute, which is a tuple containing the number of elements in each dimension of the array/matrix. – ali_m Aug 02 '14 at 21:12
  • Looks like `Columns` is a python `list`. I'm getting `'list' object has no attribute 'nbytes'` when I try your first suggestion. EDIT: I guess each item in `Columns` is a python `list`. That's what I'm actually calling `.nbytes` on. – Mark Ebbert Aug 02 '14 at 21:19
  • OK, then you will need `len(Columns)` to get the number of columns, `len(Columns[0])` to get the number of elements in each column, and `sys.getsizeof(Columns[n][0])` to get the size of the first element of the nth column in bytes. Note that due to the overhead of the Python container, this size will almost certainly be much larger than the corresponding numpy element - [see here](http://stackoverflow.com/a/10365639/1461210). It would also be very useful to know what the classes actually are for each column, i.e. `type(Column[n][0])`. – ali_m Aug 02 '14 at 21:27
  • OK, I'm really embarrassed. In all of my effort, I never checked to see if I was really running out of memory. I started `top` and watched it consume all 256G. I apologize. Now I have much bigger problems. EDIT: And to answer your questions, I found the matrix is 4 X 365644. The objects are small strings (48 bytes). – Mark Ebbert Aug 02 '14 at 22:18
  • 1
    @JoeKington Based on the OP's description, it seems that he has nested lists containing scalar values, not numpy arrays. As far as I'm aware, `sys.sizeof()` would be the correct way to get the size of a built-in scalar (`int`, `float`, `double`, `str` etc.). I also asked him for the class of the scalar elements, since the corresponding numpy array `itemsize` will of course be smaller than the whole Python container. – ali_m Aug 03 '14 at 16:37
  • @ali_m - You're quite right. I missed that point. Sorry for the noise! – Joe Kington Aug 03 '14 at 17:30

1 Answers1

4

Really embarrassing. I really was running out of memory. I started top and watch all 256GB get consumed. Why I never checked that during all my investigation is a mystery to even myself. My apologies for overlooking the obvious.

Mark Ebbert
  • 441
  • 3
  • 13