Pairwise weighted distance vectorization

Question

The following efficient and vectorized Matlab code computes the Weighted Euclidean Distance between 2 sets of points A and B using a weight vector WTS (1 weight for each dimension; same weights for all points):

    WTS = sqrt(WTS); 

    % modify A and B against weight values
    A = WTS(ones(1,size(A,1)),:).*A;
    B = WTS(ones(1,size(B,1)),:).*B; 

    % calculate distance
    AA = sum(A.*A,2);  
    BB = sum(B.*B,2)'; 
    D = sqrt(AA(:,ones(1,size(B,1))) + BB(ones(1,size(A,1)),:) - 2*A*B');

(source: https://github.com/nolanbconaway/pairdist/blob/master/pairdist.m)

My question is: is there an efficient vectorized form (Matlab, R or Julia are fine) for a similar computation with the difference that WTS is a set of weight vectors with the same size as A? In other words, instead of 1 weight vector, I need 1 weight vector for each point in A.

This answer seems to do what I need, but it is in Python and I'm not sure about how to convert it to Matlab/R/Julia: https://stackoverflow.com/a/19285289/834518

Also, not a duplicate of Efficiently calculating weighted distance in MATLAB, as that question deals with the single weight vector case and I'm explicitly asking for the N weight vectors case.

EDIT: example aplications: RBF Networks and Gaussian Mixture Models, where you (can) have 1 weight vector for each neuron/component. An efficient solution to the problem is essential for those kinds of problems.

Have you tried any changes yourself that might get you closer to a solution? What did you find? — mbauman, Jan 03 '18 at 18:24
@rahnema1 Not a duplicate, that's the case with just 1 weight vector. — rcpinto, Jan 03 '18 at 18:27
@MattB. It is indeed simple for the 1 weight vector case, but I simply can't see how to do the same when there are as much weight vectors as points in A, at least with as much efficiency. And I tried some not-so-sciency ad-hoc modifications such as multiplying just A by WTS or including it in the 2AB term, obtaining catastrophic results. In other words, I tried a lot of things before asking. — rcpinto, Jan 03 '18 at 18:27
@MattB. Probably would need to compute a 3D matrix with all pairwise differences (and not distances) and then I'd be able to multiply each one with it's corresponding weight vector (I think that's what he did in the Python answer), but again, I'm not sure about how to do that efficiently. — rcpinto, Jan 03 '18 at 18:44
My tries, including @rahnema1's solution: https://imgur.com/a/REl6T Note how fast is the first solution, albeit still not including the weights. That's why I'd like to find a way to include them directly into that solution if possible. But we're already much better than the 2-for's solution. — rcpinto, Jan 03 '18 at 22:24
pastebin if anyone wants to copy the code: https://pastebin.com/0bnms3gE — rcpinto, Jan 03 '18 at 22:32
It's probably worth checking whether you can incorporate the weighted Euclidean method from the [Distances package](https://github.com/JuliaStats/Distances.jl). There are all sorts of neat optimizations in that package that regular julia code might miss. — Colin T Bowers, Jan 03 '18 at 22:48
Don't use `Float32`. Result of the test using default Float64 is `1.73` ,`81.22`, `23.50` and `2.81` seconds. — rahnema1, Jan 04 '18 at 04:20
@rahnema1 rcpinto , Rerunning the versions in the pastebin, and fixing a few bugs (like not taking sqrt in the first three versions) and adding the Distance.jl version shows Distance.jl is clearly more efficient by a 2x margin. It would be interesting to dig further, but only if there is proper benchmarking (using BenchmarkTools package). Otherwise, accepting (and using) the Distance.jl package version seems to be the best way forward — Dan Getz, Jan 05 '18 at 10:42
@DanGetz As I commented under Distance.jl solution it requires that `A` and `B` to be of the same size and `W` to be a vector but OP wants `A` and `B` to be of different sizes and `W` to be a matrix with the same size as `A`. Can you please show your code? — rahnema1, Jan 05 '18 at 14:29
@rahnema1 It does not require `A` and `B` to be the same size (perhaps transposed from code in question, but this is not an issue usually). As for `W`, it needs to be a vector of the dimension length. rahnema1, rcpinto asked the question, why do you comment instead of OP? — Dan Getz, Jan 05 '18 at 17:17
@DanGetz I commented to reply your previous comment that mentioned my name. Do you think that is a bad thing? — rahnema1, Jan 05 '18 at 17:29
@rahnema1 Just trying to understand who is asking the question, since he knows best if a solution fits. Reiterating other commenters and juliohm's answer, try Distance.jl and it won't disappoint. — Dan Getz, Jan 05 '18 at 17:36
I think my question is clear: WTS is the same size as A. It is 1 different weight vector for each point in A. So no, Distance.jl is not a solution. — rcpinto, Jan 05 '18 at 18:06

score 5 · Answer 1 · answered Jan 03 '18 at 20:07

5

In Julia you don't have to vectorize it to be efficient, just write the loop and it'll be faster than these vectorized forms anyways because it can fuse and get rid of the temporaries. Here's a pretty efficient implementation of pairwise applies in Julia that you can work from. It has all of the bells and whistles, but you can pair it down if you want.

Note that vectorization isn't necessarily "fast", it's just faster than looping in R/Python/MATLAB because it's only doing a single function call into an optimized kernel written in a lower level language (C/C++) which is actually looping. But putting together vectorized functions usually has a lot of temporary allocations since each vectorized functions return arrays. Thus if you really need efficiency, you should avoid vectorizing in general and write it in a language that allows for low cost function calls / loops. This post explains more about issues with vectorization in high level languages.

That answers one of the three questions you have. I don't have a good answer for MATLAB or R.

answered Jan 03 '18 at 20:07

Chris Rackauckas

18,645
3
50
81

I'm aware of Julia's good non-vectorized performance and also thought it would be the best solution, but I did try a non-vectorized solution for a non-weighted version and it was way slower. However, it was a naive 2-for implementation and maybe your linked suggestions can give me better results. I'll try to work on them, thanks :) – rcpinto Jan 03 '18 at 20:30
Did you put it in a function, check its type-stability with `@code_warntype`, and get rid of all temporary allocations? – Chris Rackauckas Jan 03 '18 at 20:51
I just added my functions in a comment in the question. I'd be glad if you could take a look at them. I did run them with @code_warntype but couldn't identify anything I could do (it's been a long time since I last used Julia and I still don't quite remember how to use those logs correctly). – rcpinto Jan 03 '18 at 22:28
1

`D[i,j] = sum(((X[i, :] - Y[j, :]).*W[i,:]).^2)`. Two major things wrong with that. First of all, each `X[i,:]` slices, so that creates temporary arrays. Secondly, that's written using rows. Always compute along columns in a column-order language (this is true of MATLAB as well). Both of these are really really not good. You should have another loop in there to cut the temporary and change the ordering. – Chris Rackauckas Jan 04 '18 at 02:50

rahnema1 · Accepted Answer · 2018-01-06T17:00:44.907

Here is a vectorized version in MATLAB(R2016b and later versions):

W2 = 1./W.^2;
D = sqrt(sum((A./W).^2 ,2) - 2 * (A .* W2) * B.' +W2 * (B.^2).');

In pre R2016b versions You can use this:

W2 = 1./W.^2;
D = sqrt(bsxfun(@plus,sum((A./W).^2 ,2) , -2 * (A .* W2) * B.' +W2 * (B.^2).'));

Translation of MATLAB to julia:

W2 = 1./W.^2;
z=sqrt.(broadcast(+,sum((A./W).^2 ,2) , -2 * (A .* W2) * B.' .+W2 * (B.^2).'));

Here my proposed method,Vectorization, is compared to the Loop method provided by @DanGetz. Other solutions are not applicable here.

We can see that for dimensions less than 128 the loop version is faster than the vectorized version. Performance of the loop version would get worse as number of dimensions increases.

The following code used to produce the figure:

function pdist_vectorized (A::Matrix, B::Matrix, W::Matrix)
    W2 = 1./W.^2;
    return sqrt.(broadcast(+,sum((A./W).^2 ,2) , -2 * (A .* W2) * B.' .+W2 * (B.^2).'));
end

result = zeros(10,2);
for i = 1:10
    A = rand( 3000, 2^i);
    B = rand( 2000, 2^i);
    W = ones(size(A));
    result[i,1]=(@timed pdist_1alloc(A,B,W))[2];
    result[i,2]=(@timed pdist_vectorized(A,B,W))[2];
end

using Plots
pyplot()
plot(2.^(1:10), result, title="Pairwise Weighted Distance",
    label=["Loop" "Vectorization"], lw=3,
    xlabel = "Dimension", ylabel = "Time Elapsed(seconds)")

Thanks, @rahnema1. Please, take a look at my new comment in the question, maybe you have something to add. — rcpinto, Jan 03 '18 at 22:25

juliohm · Answer 3 · 2018-01-04T20:16:40.103

3

As an additional information for future readers, the package Distances.jl has efficient implementations of most distances you can think of. As a general advise, if an operation is very common in scientific computing, there will be a package implementing it well.

using Distances

D = pairwise(WeightedEuclidean(weights), A, B)

edited Jan 04 '18 at 20:16

answered Jan 04 '18 at 02:48

juliohm

3,691
2
18
22

There are two problems with your solution: 1. A and B should be matrices of the same size. 2. weights should be a vector. – rahnema1 Jan 05 '18 at 04:18
No, A and B need not be of the same size. Regarding the weights, isn't it trivial to figure out by yourself? The point of my answer is that there is a package out there that should be used instead of reinventing the wheel. The rest are just details. – juliohm Jan 05 '18 at 05:59
1

The matrices A and B (or X and Y) should contain the coordinates of the points as columns. The vector of weights should contain one weight for each dimension of the space. If you are in R^2, these are 2 weights. This is the minimum representation to compute pairwise weighted Euclidean efficiently. Anything that deviates from this representation is provably suboptimal. – juliohm Jan 05 '18 at 06:16
No, suboptimal because any extra amount of bits to represent this task is unnecessary. Furthermore, solutions like the one you typed are hard to read, and will likely produce temporaries in case you forget to type a single dot "." in it. – juliohm Jan 05 '18 at 06:51
Thanks for comments. I am waiting for a much faster (optimal) and correct solution to the question. – rahnema1 Jan 05 '18 at 07:08
1

@juliohm the "details" are precisely my question. Also, using more weights is not suboptimal, they are different problems altogether. It doesn't make sense to have just 1 weight vector in a RBF network or a Gaussian Mixture. – rcpinto Jan 05 '18 at 15:35
Thank you @rcpinto, glad you solved your problem. Future users will have similar problems and each answer provides a variation of interest. – juliohm Jan 05 '18 at 20:44

Dan Getz · Answer 4 · 2018-01-06T11:45:24.090

2

Another version optimized to allocate the result matrix and nothing else:

function pdist_1alloc(A::Matrix, B::Matrix, W::Matrix)
    LA, LD = size(A) ; LB = size(B,1)
    res = zeros(LB, LA)
    indA = 0 ; indB = 0 ; indres = 0
    @inbounds for i=1:LD
        for j=1:LA
            a = A[indA+j] ; w = W[indA+j] ; a2w = a^2*w ; awtmp = -2.0*a*w
            for k=1:LB
                indres += 1
                b = B[indB+k] ; b2w = b^2*w
                res[indres] += a2w+awtmp*b+b2w
            end
        end
        indA += LA ; indB += LB ; indres = 0
    end
    res .= sqrt.(res)
    return res
end

It is about 2x faster than @rahnema1's version, and uses the same tricks, but not as readable. In addition, I apologize for misunderstanding the exact setup of the question in the first place (and suggesting Distance.jl which is not directly applicable here).

edited Jan 06 '18 at 11:45

answered Jan 06 '18 at 11:29

Dan Getz

17,002
2
23
41

Thanks, Dan, and don't worry about the misunderstanding. Anyway, did you compare the results of your function with the other versions? I think it is missing something, as the weighted distance between (1,1) and (3,3) with weight (2,2) is giving me 4, while it should be sqrt(32). ( sqrt((2*2)^2 + (2*2)^2) ) – rcpinto Jan 06 '18 at 17:35
@rahnema1 Perhaps we still have different definitions for "weighted distance". rahnema1's gives me sqrt(2) for this distance. My version gives 4.0. (considering a 1 vector A with a 1 vector B - gives a single value). According to my understanding, the square difference of each dimension is multiplied by the weight, i.e. (3-1)^2*2+(3-1)^2*2 = 16 and sqrt-ing gives 4.0. What should be the correct answer? `pdist([1.0 1.0], [3.0 3.0], [2.0 2.0])` – Dan Getz Jan 06 '18 at 17:44
The definition is gotten from the [python](https://stackoverflow.com/a/19285289/834518) post. I think you should transpose the result. – rahnema1 Jan 06 '18 at 17:52
@rahnema1 The transpose is right (and immaterial), but the values our versions give is different. For `pdist([1.0 1.0], [3.0 3.0], [2.0 2.0])`, one is `sqrt(2.0)` and the other is `4.0` – Dan Getz Jan 06 '18 at 17:53
The definition a I'm working with is ||(Xi - Yj)/Wi|| (or ||(Xi - Yj)*Wi||), I'm not sure if there is a canonical mathematical form considered to be the right one. – rcpinto Jan 06 '18 at 17:57
Well, `pdist_1alloc` returns neither of these, but rather the *weighted Euclidean distance* as defined in: https://math.stackexchange.com/questions/917066/calculating-weighted-euclidean-distance-with-given-weights , https://stats.stackexchange.com/questions/15289/when-to-use-weighted-euclidean-distance-and-how-to-determine-the-weights-to-use , http://84.89.132.1/~michael/stanford/maeb4.pdf . Which is the same as using the square of the weights in the definition in the comment above. – Dan Getz Jan 06 '18 at 18:06
1

Thanks, Dan, it makes sense. Both definitions are fine for me, since those weights will be learned anyway. BTW, I did new tests on JuliaBox with A 2000x1000 and B 3000x1000, with Float32 (left) and Float64 (right): https://imgur.com/a/S57YU (4th row is rahnema1's and 5th is yours). – rcpinto Jan 06 '18 at 18:32

Pairwise weighted distance vectorization

4 Answers4