2

I'm dealing with a big array D with which I'm running into memory problems. However, the entries of that big array are in fact just copies of elements of a much smaller array B. Now my idea would be to use something like a "dynamic view" into B instead of constructing the full D. For example, is it possible to use a function D_fun like an array which the reads the correct element of B? I.e. something like

def D_fun(B, I, J):
    i = convert_i_idx(I, J)
    j = convert_j_idx(I, J)

    return B[i,j]

And then I could use D_fun to do some matrix and vector multiplications.

Of course, anything else that would keep me form copying the elements of B repeatedly into a huge matrix would be appreciated.

Edit: I realized that if I invest some time in my other code I can get the matrix D to be a block matrix with the Bs on the diagonal and zeros otherwise.

obachtos
  • 977
  • 1
  • 12
  • 30

2 Answers2

3

This is usually done by subclassing numpy.ndarray and overloading __getitem__, __setitem__, __delitem__ (array-like access via []) to remap the indices like D_fun(..) does. Still, I am not sure if this will work in combination with the numpy parts implemented in C.

Some concerns: When you're doing calculations on your big matrix D via the small matrix B, numpy might create a copy of D with its real dimensions, thus using more space than wanted.

If several (I1,J1), (I2,J2).. are mapped to the same (i,j), D[I1,J1] = newValue will also set D(I2,J2) to newValue.

nitzel
  • 1,565
  • 14
  • 14
  • I'll give that a try! Your second concern is not an issue. I'll probably won't change the array once created and even if I had to, that would be the desired behavior. – obachtos Feb 03 '17 at 12:30
1

np.dot uses compiled libraries to perform fast matrix products. That constrains the data type (integer, floats), and requires that the data be contiguous. I'd suggest studying this recent question about large dot products, numpy: efficient, large dot products

Defining a class with a custom __getitem__ is a way of accessing a object with indexing syntax. Look in numpy/lib/index_tricks.py for some interesting examples of this, np.mgrid,np.r_, np.s_ etc. But this is largely a syntax enhancement. It doesn't avoid the issues of defining a robust and efficient mapping between your D and B.

And before trying to do much with subclassing ndarray take a look at the implementation for np.matrix or np.ma. scipy.sparse also creates classes that behave like ndarray in many ways, but does not subclass ndarray.

In your D_fun are I and J scalars? If so this conversion would be horribly in efficient. It would be better if they could be arrays, lists or slices (anything that B[atuple] implements), but that can be a lot of work.

def D_fun(B, I, J):
    i = convert_i_idx(I, J)
    j = convert_j_idx(I, J)    
    return B[i,j]

def __getitem__(self, atuple):
    # sketch of a getitem version of your function
    I, J = atuple
    <select B based on I,J?>
    i = convert_i_idx(I, J)
    j = convert_j_idx(I, J) 
    return B.__getitem__((i,j))

What is the mapping from D to B like? The simplest, and most efficient mapping would be that D is just a higher dimensional collection of B, i.e.

 D = np.array([B0,B1,B2,...,Bn])
 D[0,...] == B0  

Slightly more complicated is the case where D[n1:n2,....] == B0, a slice

But if the B0 values are scattered around D you chances of efficient, reliable mapping a very small.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I'm not sure if I got the last part and how that could help me. I realized that `D` can be written in diagonal block form (see edit above). Does that work in that direction? – obachtos Feb 06 '17 at 21:41
  • Take a look at `scipy.sparse` `block_diag` construction. – hpaulj Feb 07 '17 at 00:07
  • Mhh, but how could that exploit the fact that the matrices on the diagonal are identical in order to save memory? – obachtos Feb 07 '17 at 12:42