We have a matrix N x N consisting of n x n blocks. So we have (N/n) x (N/n) blocks. We further divide it into large blocks so that each large block contains m x m number of smaller blocks. And then we need to sum (block-wise) smaller blocks inside each larger block. For example here each A is nxn and m = 2. enter image description here
What is the simplest and possibly fast way of doing that with numpy
array?