My question is related to Block mean of numpy 2D array and block mean of 2D numpy array (in both dimensions) (in fact it is just more general case). I will explain this on simple example.
Let's assume we have 6x6
2D array:
array([[7, 1, 6, 6, 4, 2],
[8, 5, 5, 6, 3, 5],
[3, 1, 7, 1, 3, 4],
[6, 8, 3, 2, 3, 3],
[8, 6, 7, 1, 1, 3],
[8, 5, 4, 5, 1, 4]])
Now each row (and column) in this matrix is assigned to one of three communities (communities can be of different size) e.g. array([0, 0, 1, 1, 1, 2])
would represent this assignment. Now I need to split this matrix according this assignment and calculate mean over blocks (slices). This would produce 3x3
matrix of block means. For example block (or slice) for community pair (0,0) is an 2x2
array:
array([[7, 1],
[8, 5]])
that has mean of 5.25
. Block for community pair (0, 1) is an 2x3
array:
array([[6, 6, 4],
[5, 6, 3]])
with mean 5
, and so on..
Resulting array of block means should look like this:
array([[5.25 , 5. , 3.5 ],
[5.33333333, 3.11111111, 3.33333333],
[6.5 , 3.33333333, 4. ]])
My question is how to calculate this efficiently. For now I am using for loops – for each pair of communities I get proper slice, calculate mean over that slice and store this in separate matrix. However I need to perform this operation many times and it takes a lot of time.
I cannot really use (or I dont know how) approaches with reshape
since it needs an assumption of equal block size.