12

I am using numpy and trying to create a huge matrix. While doing this, I receive a memory error

Because the matrix is not important, I will just show the way how to easily reproduce the error.

a = 10000000000
data = np.array([float('nan')] * a)

not surprisingly, this throws me MemoryError

There are two things I would like to tell:

  1. I really need to create and to use a big matrix
  2. I think I have enough RAM to handle this matrix (I have 24 Gb or RAM)

Is there an easy way to handle big matrices in numpy?

Just to be on the safe side, I previously read these posts (which sounds similar):

Very large matrices using Python and NumPy

Python/Numpy MemoryError

Processing a very very big data set in python - memory error

P.S. apparently I have some problems with multiplication and division of numbers, which made me think that I have enough memory. So I think it is time for me to go to sleep, review math and may be to buy some memory.

May be during this time some genius might come up with idea how to actually create this matrix using only 24 Gb of Ram.

Why I need this big matrix I am not going to do any manipulations with this matrix. All I need to do with it is to save it into pytables.

Community
  • 1
  • 1
Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • 4
    How do you expect to fit 10 billion floats in 24 GB? If a float were 2.4 bytes, and 100% of your RAM were devoted to holding this array - sure ;-) – Tim Peters Sep 30 '13 at 00:56
  • What do you need to do with this matrix? That might give an insight to a workaround. – Rohit Sep 30 '13 at 01:14
  • cant you save it piece by piece? n work on partitions of your data? – usethedeathstar Sep 30 '13 at 07:20
  • Also, The way you create it first creates a python list of that size. Now the float is always the same object, but the list itself will have the same size as the resulting array (pointer is 8 bytes and double is 8 bytes). So use `np.empty` plus `np.fill` to create arrays, especially in tight memory situations. – seberg Sep 30 '13 at 08:47
  • (though this arrays that large, you should likely not be loading it into RAM directly anyway, but use hdf5 or memory mapping or...) – seberg Sep 30 '13 at 08:49
  • your can use `numpy.memmap`. Please, [check this answer for some more description...](http://stackoverflow.com/a/16633274/832621) – Saullo G. P. Castro Sep 30 '13 at 10:48

3 Answers3

28

Assuming each floating point number is 4 bytes each, you'd have

(10000000000 * 4) /(2**30.0) = 37.25290298461914

Or 37.5 gigabytes you need to store in memory. So I don't think 24gb of RAM is enough.

Eric Urban
  • 3,671
  • 1
  • 18
  • 23
6

If you can't afford creating such a matrix, but still wish to do some computations, try sparse matrices.

If you wish to pass it to another Python package that uses duck typing, you may create your own class with __getitem__ implementing dummy access.

Tigran Saluev
  • 3,351
  • 2
  • 26
  • 40
1

If you use pycharm editor for python you can change memory settings from

C:\Program Files\JetBrains\PyCharm 2018.2.4\bin\pycharm64.exe.vmoptions

you can decrease pycharm speed from this file so your program memory will allocate more megabites you must edit this codes

-Xms1024m
-Xmx2048m
-XX:ReservedCodeCacheSize=960m

so you can make them -Xms512m -Xmx1024m and finally your program will work but it'll affect the debugging performance in pycharm.

Sai
  • 21
  • 6
Ottoman Empire
  • 221
  • 2
  • 5