It is possible to implement a solution only using numpy. First, I assume that your vectors v and matrices S are gathered into numpy arrays, respectively with shapes (n, 3) and (n, 3, 3).
If not, it is possible to concatenate vectors and matrices using np.concatenate:
S = np.concatenate([s.reshape(1, d, d) for s in [s_1, s_2, ..., s_n]], axis=0)
V = np.concatenate([v.reshape(1, -1) for v in [v_1, v_2, ..., v_n]], axis=0)
I also assume that A is stored into a numpy array of shape (m, 3).
The naive implementation (using for loops) is the following:
def naive(A, S, V):
assert S.shape[0] == V.shape[0]
M = np.zeros((A.shape[0], S.shape[0]))
for i in range(A.shape[0]):
for j in range(V.shape[0]):
diff = A[i] - V[j]
M[i, j] = np.dot(
np.dot(
diff.reshape(1, -1),
S[j],
),
diff.reshape(-1, 1),
).reshape(-1)
return M
And the optimized one, only using numpy:
def only_numpy(A, S, V):
diff = A.reshape(m, 1, 3) - V.reshape(1, n, 3)
return np.matmul(
np.matmul(
diff.reshape(m, n, 1, 3),
S.reshape(1, n, 3, 3),
),
diff.reshape(m, n, 3, 1),
).reshape(m, n)
We factorize the substraction using numpy broadcasting. We also use matmul to broadcast matrix multiplication (see this post).
We can verify that both function return the same result with:
assert np.allclose(naive(A, S, V), only_numpy(A, S, V))
Concerning the optimisation, I ran some experiments. Assuming: d=3, and n=2*m. I have (naive, only_numpy and einsum columns report elapsed time in seconds, using this post or memory using memory-profiler with default resolution):
% time in seconds
n m naive (t) only_numpy (t) einsum (t)
0 100 100 0.185373 0.000584 0.000701
1 1000 1000 18.964981 0.044783 0.054029
2 10000 10000 N/A 4.228531 4.098396
% memory in MiB
n m naive (m) only_numpy (m) einsum (m)
0 100 100 63.6 63.6 63.6
1 1000 1000 101.9 109.8 102.0
2 10000 10000 N/A 4689.3 3918.7
The einsum implementation is from David (see the other answer).