Is n
potentially large, so the result is a large sparse matrix with nonzero values concentrated along the diagonal? Sparse matrices are designed with this kind of matrix in mind (from FD and FE PDE problems). I did this a lot in MATLAB, and some with the scipy
sparse module.
That module has a block definition mode that might work, but what I'm more familiar with is the coo
to csr
route.
In the coo
format, nonzero elements are defined by 3 vectors, i
, j
, and data
. You can collect all the values for A
, B
, etc in these arrays (applying the appropriate offset for the values in B
etc), without worrying about overlaps. Then when that format is converted to csr
(for matrix calculations) the overlapping values are summed - which is exactly what you want.
I think the sparse
documentation has some simple examples of this. Conceptually the simplest thing to do is iterate over the n
submatrices, and collect the values in those 3 arrays. But I also worked out a more complex system whereby it can be done as one big array operation, or by iterating over a smaller dimension. For example each submatrix has 16 values. In a realistic case 16 will be much smaller than n.
I'd have play around with code to give a more concrete example.
==========================
Here's a simple example with 3 blocks - functional, but not the most efficient
Define 3 blocks:
In [620]: A=np.ones((4,4),int)
In [621]: B=np.ones((4,4),int)*2
In [622]: C=np.ones((4,4),int)*3
lists to collect values in; could be arrays, but it is easy, and relatively efficient to append or extend lists:
In [623]: i, j, dat = [], [], []
In [629]: def foo(A,n):
# turn A into a sparse, and add it's points to the arrays
# with an offset of 'n'
ac = sparse.coo_matrix(A)
i.extend(ac.row+n)
j.extend(ac.col+n)
dat.extend(ac.data)
In [630]: foo(A,0)
In [631]: i
Out[631]: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
In [632]: j
Out[632]: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]
In [633]: foo(B,1)
In [634]: foo(C,2) # do this in a loop in the real world
In [636]: M = sparse.csr_matrix((dat,(i,j)))
In [637]: M
Out[637]:
<6x6 sparse matrix of type '<class 'numpy.int32'>'
with 30 stored elements in Compressed Sparse Row format>
In [638]: M.A
Out[638]:
array([[1, 1, 1, 1, 0, 0],
[1, 3, 3, 3, 2, 0],
[1, 3, 6, 6, 5, 3],
[1, 3, 6, 6, 5, 3],
[0, 2, 5, 5, 5, 3],
[0, 0, 3, 3, 3, 3]], dtype=int32)
If I've done this right, overlapping values of A,B,C are summed.
More generally:
In [21]: def foo1(mats):
i,j,dat = [],[],[]
for n,mat in enumerate(mats):
A = sparse.coo_matrix(mat)
i.extend(A.row+n)
j.extend(A.col+n)
dat.extend(A.data)
M = sparse.csr_matrix((dat,(i,j)))
return M
....:
In [22]: foo1((A,B,C,B,A)).A
Out[22]:
array([[1, 1, 1, 1, 0, 0, 0, 0],
[1, 3, 3, 3, 2, 0, 0, 0],
[1, 3, 6, 6, 5, 3, 0, 0],
[1, 3, 6, 8, 7, 5, 2, 0],
[0, 2, 5, 7, 8, 6, 3, 1],
[0, 0, 3, 5, 6, 6, 3, 1],
[0, 0, 0, 2, 3, 3, 3, 1],
[0, 0, 0, 0, 1, 1, 1, 1]], dtype=int32)
Coming up with a way of doing this more efficiently may depend on how the individual submatrices are generated. If they are created iteratively, you might as well collect the i,j,data values iteratively as well.
==========================
Since the submatrices are dense, we can get the appropriate i,j,data
values directly, without going through a coo
intermediary. And without looping if the A,B,C
are collected into one larger array.
If I modify foo1
to return a coo
matrix, I see the i,j,data
lists (as arrays) as given, without summation of duplicates. In the example with 5 matrices, I get 80 element arrays, which can be reshaped as
In [110]: f.col.reshape(-1,16)
Out[110]:
array([[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3],
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
[2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5, 2, 3, 4, 5],
[3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6],
[4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7]], dtype=int32)
In [111]: f.row.reshape(-1,16)
Out[111]:
array([[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4],
[2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5],
[3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6],
[4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7]], dtype=int32)
In [112]: f.data.reshape(-1,16)
Out[112]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
I should be able generate those without a loop, especially the row
and col
.
In [143]: mats=[A,B,C,B,A]
the coordinates for the elements of an array
In [144]: I,J=[i.ravel() for i in np.mgrid[range(A.shape[0]),range(A.shape[1])]]
replicate them with offset via broadcasting
In [145]: x=np.arange(len(mats))[:,None]
In [146]: I=I+x
In [147]: J=J+x
Collect the data into one large array:
In [148]: D=np.concatenate(mats,axis=0)
In [149]: f=sparse.csr_matrix((D.ravel(),(I.ravel(),J.ravel())))
or as a compact function
def foo3(mats):
A = mats[0]
n,m = A.shape
I,J = np.mgrid[range(n), range(m)]
x = np.arange(len(mats))[:,None]
I = I.ravel()+x
J = J.ravel()+x
D=np.concatenate(mats,axis=0)
f=sparse.csr_matrix((D.ravel(),(I.ravel(),J.ravel())))
return f
In this modest example the 2nd version is 2x faster; the first scales linearly with the length of the list; the 2nd is almost independent of its length.
In [158]: timeit foo1(mats)
1000 loops, best of 3: 1.3 ms per loop
In [160]: timeit foo3(mats)
1000 loops, best of 3: 653 µs per loop