There are cases where Amir's code fails.
Consider the following m1:
0 0 1
0 1 1
1 0 1
The best solution is to draw vertical lines in the first two columns.
Amir's code would give the following m2:
-2 -2 0
2 0 0
0 2 0
And the result would draw the two vertical lines AS WELL AS a line in the first row.
It seems to me the problem is the tie-breaking case:
return vertical > horizontal ? vertical : horizontal * -1;
Because of the way the code is written, the very similar m1 will NOT fail:
0 1 1
1 0 1
0 0 1
Where the first row is moved to the bottom, because the clearing function will clear the -2 values from m2 before those cells are reached. In the first case, the -2 values are hit first, so a horizontal line is drawn through the first row.
I've been working a little through this, and this is what I have. In the case of a tie, do not set any value and do not draw a line through those cells. This covers the case of the matrix I mentioned above, we are done at this step.
Clearly, there are situations where there will remain 0s that are uncovered. Below is another example of a matrix that will fail in Amir's method (m1):
0 0 1 1 1
0 1 0 1 1
0 1 1 0 1
1 1 0 0 1
1 1 1 1 1
The optimal solution is four lines, e.g. the first four columns.
Amir's method gives m2:
3 -2 0 0 0
3 0 -2 0 0
3 0 0 -2 0
0 0 -2 -2 0
0 0 0 0 0
Which will draw lines at the first four rows and the first column (an incorrect solution, giving 5 lines). Again, the tie-breaker case is the issue. We solve this by not setting a value for the ties, and iterating the procedure.
If we ignore the ties we get an m2:
3 -2 0 0 0
3 0 0 0 0
3 0 0 0 0
0 0 0 0 0
0 0 0 0 0
This leads to covering only the first row and the first column. We then take out the 0s that are covered to give new m1:
1 1 1 1 1
1 1 0 1 1
1 1 1 0 1
1 1 0 0 1
1 1 1 1 1
Then we keep repeating the procedure (ignoring ties) until we reach a solution. Repeat for a new m2:
0 0 0 0 0
0 0 2 0 0
0 0 0 2 0
0 0 0 0 0
0 0 0 0 0
Which leads to two vertical lines through the second and third columns. All 0s are now covered, needing only four lines (this is an alternative to lining the first four columns). The above matrix only needs 2 iterations, and I imagine most of these cases will need only two iterations unless there are sets of ties nested within sets of ties. I tried to come up with one, but it became difficult to manage.
Sadly, this is not good enough, because there will be cases that will remain tied forever. Particularly, in cases where there is a 'disjoint set of tied cells'. Not sure how else to describe this except to draw the following two examples:
0 0 1 1
0 1 1 1
1 0 1 1
1 1 1 0
or
0 0 1 1 1
0 1 1 1 1
1 0 1 1 1
1 1 1 0 0
1 1 1 0 0
The upper-left 3x3 sub-matrices in these two examples are identical to my original example, I have added 1 or 2 rows/cols to that example at the bottom and right. The only newly added zeros are where the new rows and columns cross. Describing for clarity.
With the iterative method I described, these matrices will be caught in an infinite loop. The zeros will always remain tied (col-count vs row-count). At this point, it does make sense to just arbitrarily choose a direction in the case of a tie, at least from what I can imagine.
The only issue I'm running into is setting up the stopping criteria for the loop. I can't assume that 2 iterations is enough (or any n), but I also can't figure out how to detect if a matrix has only infinite loops left within it. I'm still not sure how to describe these disjoint-tied-sets computationally.
Here is the code to do what I have come up with so far (in MATLAB script):
function [Lines, AllRows, AllCols] = FindMinLines(InMat)
%The following code finds the minimum set of lines (rows and columns)
%required to cover all of the true-valued cells in a matrix. If using for
%the Hungarian problem where 'true-values' are equal to zero, make the
%necessary changes. This code is not complete, since it will be caught in
%an infinite loop in the case of disjoint-tied-sets
%If passing in a matrix where 0s are the cells of interest, uncomment the
%next line
%InMat = InMat == 0;
%Assume square matrix
Count = length(InMat);
Lines = zeros(Count);
%while there are any 'true' values not covered by lines
while any(any(~Lines & InMat))
%Calculate row-wise and col-wise totals of 'trues' not-already-covered
HorzCount = repmat(sum(~Lines & InMat, 2), 1, Count).*(~Lines & InMat);
VertCount = repmat(sum(~Lines & InMat, 1), Count, 1).*(~Lines & InMat);
%Calculate for each cell the difference between row-wise and col-wise
%counts. I.e. row-oriented cells will have a negative number, col-oriented
%cells will have a positive numbers, ties and 'non-trues' will be 0.
%Non-zero values indicate lines to be drawn where orientation is determined
%by sign.
DiffCounts = VertCount - HorzCount;
%find the row and col indices of the lines
HorzIdx = any(DiffCounts < 0, 2);
VertIdx = any(DiffCounts > 0, 1);
%Set the horizontal and vertical indices of the Lines matrix to true
Lines(HorzIdx, :) = true;
Lines(:, VertIdx) = true;
end
%compute index numbers to be returned.
AllRows = [find(HorzIdx); find(DisjTiedRows)];
AllCols = find(VertIdx);
end