Matrix of "for x in vectors: for y in vectors:", without two for loops

Question

Can one get the cartesian outer product of two set of vectors without using two for loops? It is slow because the data is large.

[[f(x,y) for x in vectors] for y in vectors]

I am trying to do an agglomerative clustering project in python and for this, I need to create a distance matrix. This is the code that I have to define the function for the distance matrix:

 def distance_matrix(vectors):
    s = np.zeros((len(vectors), len(vectors)))
    for i in range(len(vectors)):
        for v in range(len(vectors)):
            s[i, v] = dissimilarity(vectors[i], vectors[v])
    return s

What it should do is take a take a list of NumPy arrays and return a 2D NumPy array d where the entry d[i,j] should contain the dissimilarity between vectors[i] and vectors[j].

In this case, vectors is the list of NumPy arrays and the dissimilarity is calculated by:

def dissimilarity(v1, v2):
    return (1-(v1.dot(v2)/(np.linalg.norm(v1)*np.linalg.norm(v2))))

or in other words, the dissimilarity is the cosine dissimilarity between two 1D NumPy arrays.

My goal is to find a way to get the distance matrix without the double for loops but still have the computational time be very small.

ninjagecko · Answer 1 · 2021-06-12T19:02:31.687

In general one cannot do this. In general (but not in this case), the computation time will not change because the work is physically there and exists and nomatter the accounting tricks one might try to rewrite it (a single for loop that acts like two for loops, or a recursive implementation), at the end of the day the work must still be done irrespective of the ordering you do it.

Is there a way to get rid of the for loops? Even if it slows down the run time a bit. – Cat_Smithyyyy03

Your case is special. While one normally has no reason to consider a speedup is possible, you are asking to consider tensor products, or in this case a matrix product. When you think about general matrix multiplication AB=C, a matrix product C is basically the outer product {a dot b} of all vectors a in the rowspace of A and b in the colspace of B.

So, for your case, first normalize all your vectors (so the normalization is not required later), then stack them to form A, then let B=transp(A), then do a matrix multiply.

  [[a0 a1 a2]     [[a0 [b0     [z0        [aa ab ac .. az
   [b0 b1 b2]       a1  b1  ... z1         ba bb bc .. bz
(      .       )(   a2] b2]     z2]] ) = (    .     .     )
       .                                      .      .
       .                                      .       .
   [z0 z1 z2]]                             za zb zc    zz]

Interestingly, now you can then plug this into a fast matrix-multiply algorithm, which is actually faster than O(dim * #vecs^2). Hopefully it's also optimized for self-transpose matrices that would generate symmetric matrices (which might save a factor of 2 work... maybe it has some flag like matrixmult(a,b,outputWillBeSymmetric)).

This is faster than "O(N^2)", unintuitively: This rewrite exposed a substructure in the problem, which can be leveraged to get faster than O(dim * #vecs^2). The leveragable substructure is namely the fact that you are computing the outer product of THE SAME vectors. The fast matrix-multiply algorithm will leverage this.

edit: my original answer was wrong

You have a set of size N and you wish to compute f(a,b) for all a and b in the set.

Unless you know some values are trivial, there is no way to be asymptotically faster than this, because you have to imagine the worst case: Every pair f(a,b) may be unique... so there's no way to do less than roughly N^2 work.

However since your function f is symmetric, you could do half the work, then duplicate it:

N = len(vectors)
for i in range(N):
  for v in range(N):
    dissim = #...
    s[i,v] = dissim
    s[v,i] = dissim

(You can avoid calculating your metric in the reflexive case f(a,a) because it's trivial, but that doesn't make things asymptotically faster since the fraction of such work N/N^2 tends to zero as N increases, so it's not that great an optimization... it is reasonable only if you are working with a very small number of vectors.)

Whether you should optimize further depends on whether you need to. Such code should be able to easily handle millions of small vectors. Next steps:

Is there something fishy that is making my code very slow? What could it be? We don't have enough information to comment.
If there is nothing fishy of the sort, you can try rephrasing as matrix operations, in order to stay as much as possible in numpy's optimized C routines instead of bouncing back and forth into python. This is ugly and I would avoid doing it, because your code readability will decrease.
If you are dealing with hundreds of millions of vectors, perhaps consider a more cache-friendly approach where you do for blockI in range(N//10**6): for blockV in range(N//10**6): for i in range(blockI*10**6, (blockI+1)*10**6): for v in range(blockV*...):
If you're dealing with billions of vectors, then look into leveraging gpgpu. This is quite ideal for the gpu and might be a factor of thousands speedup.

Is there a way to get rid of the for loops? Even if it slows down the run time a bit. — Cat_Smithyyyy03, Jun 07 '21 at 02:34
@Cat_Smithyyyy03: Generally you might consider tensor products. In this case, when you think about matrix multiplication AB=C, a matrix product C is basically the outer product of all vectors in the rowspace of A and the colspace of B. So, first normalize all your vectors, then stack them to form A, then let B=transp(A), then do a matrix multiply. (This means my answer was wrong I think; I have edited it thanks.) — ninjagecko, Jun 07 '21 at 03:24

Matrix of "for x in vectors: for y in vectors:", without two for loops

1 Answers1