Given two arrays A and B, how to get B values which are the closest to A

Question

Suppose I have two arrays ordered in an ascending order, i.e.:

A = [1 5 7], B = [1 2 3 6 9 10]

I would like to create from B a new vector B', which contains only the closest values to A values (one for each).

I also need the indexes. So, in my example I would like to get:

B' = [1 6 9], Idx = [1 4 5]

Note that the third value is 9. Indeed 6 is closer to 7 but it is already 'taken' since it is close to 4.

Any idea for a suitable code?

Note: my true arrays are much larger and contain real (not int) values

Also, it is given that B is longer then A

Thanks!

You can use `interp1`. Possible duplicate of http://stackoverflow.com/a/40809851/6579744 — rahnema1, Dec 26 '16 at 16:20

Thales · Answer 1 · 2016-12-26T15:48:33.967

Assuming you want to minimize the overall discrepancies between elements of A and matched elements in B, the problem can be written as an assignment problem of assigning to every row (element of A) a column (element of B) given a cost matrix C. The Hungarian (or Munkres') algorithm solves the assignment problem.

I assume that you want to minimize cumulative squared distance between A and matched elements in B, and use the function [assignment,cost] = munkres(costMat) by Yi Cao from https://www.mathworks.com/matlabcentral/fileexchange/20652-hungarian-algorithm-for-linear-assignment-problems--v2-3-:

A = [1 5 7];
B = [1 2 3 6 9 10];
[Bprime,matches] = matching(A,B)

function [Bprime,matches] = matching(A,B)
    C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
    [matches,~] = munkres(C);
    Bprime = B(matches);
end

Assuming instead you want to find matches recursively, as suggested by your question, you could either walk through A, for each element in A find the closest remaining element in B and discard it (sortedmatching below); or you could iteratively form and discard the distance-minimizing match between remaining elements in A and B until all elements in A are matched (greedymatching):

A = [1 5 7];
B = [1 2 3 6 9 10];
[~,~,Bprime,matches] = sortedmatching(A,B,[],[])
[~,~,Bprime,matches] = greedymatching(A,B,[],[])

function [A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches)
    [~,ix] = min((A(1) - B).^2);
    matches = [matches ix];
    Bprime = [Bprime B(ix)];
    A = A(2:end);
    B(ix) = Inf;
    if(not(isempty(A)))
        [A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches);
    end
end

function [A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches)
    C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
    [minrows,ixrows] = min(C);
    [~,ixcol] = min(minrows);
    ixrow = ixrows(ixcol);
    matches(ixrow) = ixcol;
    Bprime(ixrow) = B(ixcol);
    A(ixrow) = -Inf;
    B(ixcol) = Inf;
    if(max(A) > -Inf)
        [A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches);
    end
end

While producing the same results in your example, all three methods potentially give different answers on the same data.

score 0 · Answer 2 · answered Dec 26 '16 at 16:08

Normally I would run screaming from for and while loops in Matlab, but in this case I cannot see how the solution could be vectorized. At least it is O(N) (or near enough, depending on how many equally-close matches to each A(i) there are in B). It would be pretty simple to code the following in C and compile it into a mex file, to make it run at optimal speed, but here's a pure-Matlab solution:

function [out, ind] = greedy_nearest(A, B)

if nargin < 1, A = [1 5 7]; end
if nargin < 2, B = [1 2 3 6 9 10]; end

ind = A * 0;
walk = 1;
for i = 1:numel(A)
    match = 0;
    lastDelta = inf;
    while walk < numel(B)
        delta = abs(B(walk) - A(i));
        if delta < lastDelta, match = walk; end
        if delta > lastDelta, break, end
        lastDelta = delta;
        walk = walk + 1;
    end
    ind(i) = match;
    walk = match + 1;
end
out = B(ind);

NLindros · Answer 3 · 2016-12-27T08:01:29.930

You could first get the absolute distance from each value in A to each value in B, sort them and then get the first unique value to a sequence when looking down in each column.

% Get distance from each value in A to each value in B
[~, minIdx] = sort(abs(bsxfun(@minus, A,B.')));

% Get first unique sequence looking down each column
idx = zeros(size(A));
for iCol = 1:numel(A)
    for iRow = 1:iCol
        if ~ismember(idx, minIdx(iRow,iCol))
            idx(iCol) = minIdx(iRow,iCol);
            break
        end
    end
end

The result when applying idx to B

>> idx
1     4     5
>> B(idx)
1     6     9

Given two arrays A and B, how to get B values which are the closest to A

3 Answers3