0

Possible Duplicate:
Python Numpy Very Large Matrices

I tried numpy.zeros((100k x 100k)) and it returned "array is too big". Response to comments: 1) I could create 10k x 10k matrix but not 100kx100k and 1milx1mil. 2) The matrix is not sparse.

Community
  • 1
  • 1
nsredar
  • 9
  • 1
  • 1
  • 3

3 Answers3

15

We can do simple maths to find out. A 1 million by 1 million matrix has 1,000,000,000,000 elements. If each element takes up 4 bytes, it would require 4,000,000,000,000 bytes of memory. That is, 3.64 terabytes.

There are also chances that a given implementation of Python uses more than that for a single number. For instance, just the leap from a float to a double means you'll need 7.28 terabytes instead. (There are also chances that Python stores the number on the heap and all you get is a pointer to it, approximately doubling the footprint, without even taking in account metadata–but that's slippery grounds, I'm always wrong when I talk about Python internals, so let's not dig it too much.)

I suppose numpy doesn't have a hardcoded limit, but if your system doesn't have that much free memory, there isn't really anything to do.

zneak
  • 134,922
  • 42
  • 253
  • 328
  • Python does use more than that for a single number (about 10 bytes IIRC), but numpy is written in C and Fortran. – Nick ODell Jun 14 '11 at 04:34
  • Just as a side note, the point of numpy is that arrays are compact in memory, so a `numpy.float32` array only takes 4 bytes per element (plus a tiny bit of constant overhead for the whole array). What you said is quite true for python lists, though! – Joe Kington Jun 14 '11 at 04:36
  • 1
    People are usually wrong when they ponder the internals of any complex system. Including the ones they built themselves. – Nicholas Knight Jun 14 '11 at 04:39
  • 1
    Strictly (according to ISO) that's 3.64 _tibibytes_ and 4 terabytes. But I hate the word “tibibyte” so +1. :-) – Donal Fellows Jun 14 '11 at 08:34
5

Does your matrix have a lot of zero entries? I suspect it does, few people do dense problems that large.

You can easily do that with a sparse matrix. SciPy has a good set built in. http://docs.scipy.org/doc/scipy/reference/sparse.html The space required by a sparse matrix grows with the number of nonzero elements, not the dimensions.

Adam
  • 16,808
  • 7
  • 52
  • 98
2

Your system probably won't have enough memory to store the matrix in memory, but nowadays you might well have enough terabytes of free disk space. In that case, numpy.memmap would allow you to have the array stored on disk, but appear as if it resides in memory.

However, it's probably best to rethink the problem. Do you really need a matrix this large? Any computations involving it will probably be infeasibly slow, and need to be done blockwise.

pv.
  • 33,875
  • 8
  • 55
  • 49