Consider the code below of an operation in non-vectorized form
import numpy as np
N = 110
observables = np.random.rand(N)
I = np.random.randint(np.array([0,0])[:,None], np.array([4,5])[:,None],
size=(2,N))
# you can think of of this as two groupings of the data with numbers
# indicating separate groups (one has 4 groups the other 5). I want
# to compute then all the averages for each i,j where i is the index
# from the first grouping and j the second
averages = np.zeros((4,5))
#unvectorized
for i in range(4):
for j in range(5):
J = np.argwhere((I[0,:]==i) & (I[1,:]==j)).flatten()
averages[i,j] = np.mean(observables[J])
I came up with the following vectorization but it is very inefficient for large N as least common multiple will grow to intractable sizes for even something like N=1000
import math
#vectorized inefficiently
J = [np.argwhere((I[0,:]==i) & (I[1,:]==j)).flatten() for i in range(4)
for j in range(5)]
lengths = [len(j) for j in J]
L = math.lcm(*lengths)
J = np.array([np.tile(j,int(L/len(j))) for j in J])
averages_vectorized = np.reshape(np.mean(observables[J], axis=1), (4,5))
Is there any other way to vectorize this? For instance can a list of indices like [0,1,2] be extended with something such as [0, 1, 2, sth, sth] such that when I try to access elements of a numpy vector with this list of indices, sth is not taken into account?
ps:
There is also the following way which is just hiding for loops in two list comprehensions
J = [np.argwhere((I[0,:]==i) & (I[1,:]==j)).flatten() for i in range(4)
for j in range(5)]
averages_list_comrehension = np.reshape(np.array([np.mean(observables[j]) for j in J]), (4,5))
ps2: One way to do it as to add a nan value at the end of the observables and then extend any element of J with the index N until all are of the same size and use np.nanmean. However my end goal is to apply this in the context of tensors from PyTensor so not sure how one would do the same there (in that case observables would be scalar tensors and I dont know if there a nan tensor and nanmean in PyTensor)