Since M
is binary, we can think about this as a graph problem. i
,j
in {1..n}
correspond to nodes, and M(i,j)
indicates whether there is an undirected edge connecting them.
Since A
,B
,C
,D
are disjoint, that simplifies the problem a bit. We can approach the problem in stages:
- Find all
(c,d)
for which there exists a
such that M(a,c) < M(a,d)
. Let's call this set CD_lt_a
, (the subset of C
*D
such that the "less than" inequality holds for some a
).
- Find all
(c,d)
for which there exists a
such that M(a,c) <= M(a,d)
, and call this set CD_le_a
.
- Repeat for
b
, forming CD_lt_b
for M(b,d) < M(b,c)
and CD_le_b
for M(b,d)<=M(b,c)
.
- One way to satisfy the overall inequality is for
M(a,c) < M(a,d)
and M(b,d) <= M(b,c)
, so we can look at the intersection of CD_lt_a
and CD_le_b
.
- The other way is if
M(a,c) <= M(a,d)
and M(b,d) < M(b,c)
, so look at the intersection of CD_le_a
and CD_lt_b
.
- With
(c,d)
known, we can go back and find the (a,b)
.
And so my implementation is:
% 0. Some preliminaries
% Get the size of each set
mA = numel(A); mB = numel(B); mC = numel(C); mD = numel(D);
% 1. Find all (c,d) for which there exists a such that M(a,c) < M(a,d)
CA_linked = M(C,A);
AD_linked = M(A,D);
CA_not_linked = ~CA_linked;
% Multiplying these matrices tells us, for each (c,d), how many nodes
% in A satisfy this M(a,c)<M(a,d) inequality
% Ugh, we need to cast to double to use the matrix multiplication
CD_lt_a = (CA_not_linked * double(AD_linked)) > 0;
% 2. For M(a,c) <= M(a,d), check that the converse is false for some a
AD_not_linked = ~AD_linked;
CD_le_a = (CA_linked * double(AD_not_linked)) < mA;
% 3. Repeat for b
CB_linked = M(C,B);
BD_linked = M(B,D);
CD_lt_b = (CB_linked * double(~BD_linked)) > 0;
CD_le_b = (~CB_linked * double(BD_linked)) < mB;
% 4. Find the intersection of CD_lt_a and CD_le_b - this is one way
% to satisfy the inequality M(a,c)+M(b,d) < M(a,d)+M(b,c)
CD_satisfy_ineq_1 = CD_lt_a & CD_le_b;
% 5. The other way to satisfy the inequality is CD_le_a & CD_lt_b
CD_satisfy_ineq_2 = CD_le_a & CD_lt_b;
inequality_feasible = any(CD_satisfy_ineq_1(:) | CD_satisfy_ineq_2(:));
Note that you can stop here if feasibility is your only concern. The complexity is A*C*D + B*C*D
, which is better than the worst-case A*B*C*D
complexity of the for loop. However, early termination means your nested for loops may still be faster in certain cases.
The next block of code enumerates all the a,b,c,d
that satisfy the inequality. It's not very well optimized (it appends to a matrix from within a loop), so it can be pretty slow if there are many results.
% 6. With (c,d) known, find a and b
% We can define these functions to help us search
find_a_lt = @(c,d) find(CA_not_linked(c,:)' & AD_linked(:,d));
find_a_le = @(c,d) find(CA_not_linked(c,:)' | AD_linked(:,d));
find_b_lt = @(c,d) find(CB_linked(c,:)' & ~BD_linked(:,d));
find_b_le = @(c,d) find(CB_linked(c,:)' | ~BD_linked(:,d));
% I'm gonna assume there aren't too many results, so I will be appending
% to an array inside of a for loop. Bad for performance, but maybe a bit
% more readable for a StackOverflow answer.
results = zeros(0,4);
% Find those that satisfy it the first way
[c_list,d_list] = find(CD_satisfy_ineq_1);
for ii = 1:numel(c_list)
c = c_list(ii); d = d_list(ii);
a = find_a_lt(c,d);
b = find_b_le(c,d);
% a,b might be vectors, in which case all combos are valid
% Many ways to find all combos, gonna use ndgrid()
[a,b] = ndgrid(a,b);
% Append these to the growing list of results
abcd = [a(:), b(:), repmat([c d],[numel(a),1])];
results = [results; abcd];
end
% Repeat for the second way
[c_list,d_list] = find(CD_satisfy_ineq_2);
for ii = 1:numel(c_list)
c = c_list(ii); d = d_list(ii);
a = find_a_le(c,d);
b = find_b_lt(c,d);
% a,b might be vectors, in which case all combos are valid
% Many ways to find all combos, gonna use ndgrid()
[a,b] = ndgrid(a,b);
% Append these to the growing list of results
abcd = [a(:), b(:), repmat([c d],[numel(a),1])];
results = [results; abcd];
end
% Remove duplicates
results = unique(results, 'rows');
% And actually these a,b,c,d will be indices into A,B,C,D because they
% were obtained from calling find() on submatrices of M.
if ~isempty(results)
results(:,1) = A(results(:,1));
results(:,2) = B(results(:,2));
results(:,3) = C(results(:,3));
results(:,4) = D(results(:,4));
end
I tested this on the following test case:
m = 1000;
A = (1:m); B = A(end)+(1:m); C = B(end)+(1:m); D = C(end)+(1:m);
M = rand(D(end),D(end)) < 1e-6; M = M | M';
I like to think that first part (see if the inequality is feasible for any a,b,c,d) worked pretty well. The other vectorized answers (that use ndgrid
or combvec
to enumerate all combinations of a,b,c,d) would require 8 terabytes of memory for a problem of this size!
But I would not recommend running the second part (enumerating all of the results) when there are more than a few hundred c,d
that satisfy the inequality, because it will be pretty damn slow.
P.S. I know I answered already, but that answer was about vectorizing such loops in general, and is less specific to your particular problem.
P.P.S. This kinda reminds me of the stable marriage problem. Perhaps some of those references would contain algorithms relevant to your problem as well. I suspect that a true graph-based algorithm could probably achieve the worst-case complexity as this while additionally offering early termination. But I think it would be difficult to implement a graph-based algorithm efficiently in MATLAB.
P.P.P.S. If you only want one of the feasible solutions, you can simplify step 6 to only return a single value, e.g.
find_a_lt = @(c,d) find(CA_not_linked(c,:)' & AD_linked(:,d), 1, 'first');
find_a_le = @(c,d) find(CA_not_linked(c,:)' | AD_linked(:,d), 1, 'first');
find_b_lt = @(c,d) find(CB_linked(c,:)' & ~BD_linked(:,d), 1, 'first');
find_b_le = @(c,d) find(CB_linked(c,:)' | ~BD_linked(:,d), 1, 'first');
if any(CD_satisfy_ineq_1)
[c,d] = find(CD_satisfy_ineq_1, 1, 'first');
a = find_a_lt(c,d);
b = find_a_le(c,d);
result = [A(a), B(b), C(c), D(d)];
elseif any(CD_satisfy_ineq_2)
[c,d] = find(CD_satisfy_ineq_2, 1, 'first');
a = find_a_le(c,d);
b = find_a_lt(c,d);
result = [A(a), B(b), C(c), D(d)];
else
result = zeros(0,4);
end