4

I would like to construct a function

[B, ind] = extract_ones(A)

which removes some sub-arrays from a binary array A in arbitrary dimensions, such that the remaining array B is the largest possible array with only 1's, and I also would like to record in ind that where each of the 1's in B comes from.


Example 1

Assume A is a 2-D array as shown

A = 
1 1 0 0 0 1
1 1 1 0 1 1
0 0 0 1 0 1
1 1 0 1 0 1
1 1 0 1 0 1
1 1 1 1 1 1

After removing A(3,:) and A(:,3:5), we have the output B

B = 
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1

which is the largest array with only ones by removing rows and columns of A. As the fifteen 1's of B corresponds to

A(1,1) A(1,2) A(1,6)
A(2,1) A(2,2) A(2,6)
A(4,1) A(4,2) A(4,6)
A(5,1) A(5,2) A(5,6)
A(6,1) A(6,2) A(6,6)

respectively, or equivalently

A(1) A(7)  A(31)
A(2) A(8)  A(32)
A(4) A(10) A(34)
A(5) A(11) A(35)
A(6) A(12) A(36)

so, the output ind looks like (of course ind's shape does not matter):

ind = [1 2 4 5 6 7 8 10 11 12 31 32 34 35 36]

Example 2

If the input A is constructed by

A = ones(6,3,4,3);
A(2,2,2,2) = 0;
A(4,1,3,3) = 0;
A(1,1,4,2) = 0;
A(1,1,4,1) = 0;

Then, by deleting the minimum cuboids containing A(2,2,2,2), A(4,1,3,3), A(1,1,4,3) and A(1,1,4,1), i.e. after deleting these entries

A(2,:,:,:)
A(:,1,:,:)

Then the remaining array B will be composed by 1's only. And the ones in B corresponds to

A([1,3:6],2:3,1:4,1:3)

So, the output ind lists the subscripts transformed into indices, i.e.

ind = [7    9   10  11  12  13  15  16  17  18  25  27  28  29  30  31  33  34  35  36  43  45  46  47  48  49  51  52  53  54  61  63  64  65  66  67  69  70  71  72  79  81  82  83  84  85  87  88  89  90  97  99  100 101 102 103 105 106 107 108 115 117 118 119 120 121 123 124 125 126 133 135 136 137 138 139 141 142 143 144 151 153 154 155 156 157 159 160 161 162 169 171 172 173 174 175 177 178 179 180 187 189 190 191 192 193 195 196 197 198 205 207 208 209 210 211 213 214 215 216]

As the array needed to be processed as above is in 8-D, and it should be processed more than once, so can anyone give me opinions on how to composing the program doing this task fast?


My work so far [Added at 2 am (GMT-4), 2nd Aug 2017]

My idea was that I delete the sub-arrays with the largest proportion of zero one by one. And here is my work so far:

Inds = reshape(1:numel(A),size(A)); % Keep track on which 1's survive.
cont = true;

while cont
    sz = size(A);
    zero_percentage = 0;
    Test_location = [];

    % This nested for loops are for determining which sub-array of A has the
    % maximum proportion of zeros.
    for J = 1 : ndims(A)
        for K = 1 : sz(J)
            % Location is in the form of (_,_,_,...,_)
            % where the J-th blank is K, the other blanks are colons.
            Location = strcat('(',repmat(':,',1,(J-1)),int2str(K),repmat(',:',1,(ndims(A)-J)),')');
            Test_array = eval(strcat('A',Location,';'));
            N = numel(Test_array);
            while numel(Test_array) ~= 1
                Test_array = sum(Test_array);
            end
            test_zero_percentage = 1 - (Test_array/N);
            if test_zero_percentage > zero_percentage
                zero_percentage = test_zero_percentage;
                Test_location = Location;
            end
        end
    end

    % Delete the array with maximum proportion of zeros
    eval(strcat('A',Test_location,'= [];'))
    eval(strcat('Inds',Test_location,'= [];'))

    % Determine if there are still zeros in A. If there are, continue the while loop.
    cont = A;
    while numel(cont) ~= 1
        cont = prod(cont);
    end
    cont = ~logical(cont);
end

But I encountered two problems:

1) It may be not efficient to check all arrays in all sub-dimensions one-by-one.

2) The result does not contain the most number of rectangular ones. for example, I tested my work using a 2-dimensional binary array A

A = 
 0     0     0     1     1     0
 0     1     1     0     1     1
 1     0     1     1     1     1
 1     0     0     1     1     1
 0     1     1     0     1     1
 0     1     0     0     1     1
 1     0     0     0     1     1
 1     0     0     0     0     0

It should return me the result as

B = 
 1     1
 1     1
 1     1
 1     1
 1     1
 1     1

Inds =
 34    42
 35    43
 36    44
 37    45
 38    46
 39    47

But, instead, the code returned me this:

B =
 1     1     1
 1     1     1
 1     1     1

Inds =
 10    34    42
 13    37    45
 14    38    46

*My work so far 2 [Added at 12noon (GMT-4), 2nd Aug 2017]

Here is my current amendment. This may not provide the best result. This may give a fairly OK approximation to the problem, and this does not give empty Inds. But I am still hoping that there is a better solution.

function [B, Inds] = Finding_ones(A)

Inds = reshape(1:numel(A),size(A)); % Keep track on which 1's survive.
sz0 = size(A);
cont = true;

while cont
    sz = size(A);
    zero_percentage = 0;
    Test_location = [];

    % This nested for loops are for determining which sub-array of A has the
    % maximum proportion of zeros.
    for J = 1 : ndims(A)
        for K = 1 : sz(J)
            % Location is in the form of (_,_,_,...,_)
            % where the J-th blank is K, the other blanks are colons.
            Location = strcat('(',repmat(':,',1,(J-1)),int2str(K),repmat(',:',1,(ndims(A)-J)),')');
            Test_array = eval(strcat('A',Location,';'));
            N = numel(Test_array);
            Test_array = sum(Test_array(:));

            test_zero_percentage = 1 - (Test_array/N);
            if test_zero_percentage > zero_percentage
                eval(strcat('Testfornumel = numel(A',Location,');'))
                if Testfornumel < numel(A) % Preventing the A from being empty
                    zero_percentage = test_zero_percentage;
                    Test_location = Location;
                end
            end
        end
    end

    % Delete the array with maximum proportion of zeros
    eval(strcat('A',Test_location,'= [];'))
    eval(strcat('Inds',Test_location,'= [];'))

    % Determine if there are still zeros in A. If there are, continue the while loop.
    cont = A;
    while numel(cont) ~= 1
        cont = prod(cont);
    end
    cont = ~logical(cont);
end

B = A;

% command = 'i1, i2, ... ,in'
% here, n is the number of dimansion of A.
command = 'i1';
for J = 2 : length(sz0)
    command = strcat(command,',i',int2str(J));
end

Inds = reshape(Inds,numel(Inds),1); %#ok<NASGU> 
eval(strcat('[',command,'] = ind2sub(sz0,Inds);'))

% Reform Inds into a 2-D matrix, which each column indicate the location of
% the 1 originated from A.
Inds = squeeze(eval(strcat('[',command,']')));
Inds = reshape(Inds',length(sz0),numel(Inds)/length(sz0));

end
Leba
  • 109
  • 7

2 Answers2

2

It seems a difficult problem to solve, since the order of deletion can change a lot in the final result. If in your first example you start with deleting all the columns that contain a 0, you don't end up with the desired result.

The code below removes the row or column with the most zeros and keeps going until it's only ones. It keeps track of the rows and columns that are deleted to find the indexes of the remaining ones.

function [B,ind] = extract_ones( A )
if ~islogical(A),A=(A==1);end
if ~any(A(:)),B=[];ind=[];return,end
B=A;cdel=[];rdel=[];
while ~all(B(:))
    [I,J] = ind2sub(size(B),find(B==0));
    ih=histcounts(I,[0.5:1:size(B,1)+0.5]); %zero's in rows
    jh=histcounts(J,[0.5:1:size(B,2)+0.5]); %zero's in columns
    if max(ih)>max(jh)
        idxr=find(ih==max(ih),1,'first');
        B(idxr,:)=[];
        %store deletion
        rdel(end+1)=idxr+sum(rdel<=idxr);
    elseif max(ih)==max(jh)
        idxr=find(ih==max(ih),1,'first');
        idxc=find(jh==max(jh),1,'first');
        B(idxr,:)=[];
        B(:,idxc)=[];
        %store deletions
        rdel(end+1)=idxr+sum(rdel<=idxr);
        cdel(end+1)=idxc+sum(cdel<=idxc);
    else
        idxc=find(jh==max(jh),1,'first');
        B(:,idxc)=[];
        %store deletions
        cdel(end+1)=idxc+sum(cdel<=idxc);
    end
end
A(rdel,:)=0;
A(:,cdel)=0;
ind=find(A);
EBH
  • 10,350
  • 3
  • 34
  • 59
Gelliant
  • 1,835
  • 1
  • 11
  • 23
  • Thank you for your reply, and sorry for missing my current progress in the post. I have just added it in the original post. I do not know how to check if the result is composed of the maximum number of ones or not, but your result has more ones than mine, and your code has no nested loops, so I think your code is better than mine. Thank you very much!! – Leba Aug 02 '17 at 06:37
  • Hello Gelliant, I have taken a look your codes in detail. Sorry but I found that the code does not trim the matrix into rectangular or hyper-cuboid forms. – Leba Aug 02 '17 at 08:16
  • Indeed. My code is a bit too simple for your problem I'm afraid. What might work is a seed&grow approach. You choose a starting point (must be a one) and try to grow alternating in each dimension to get a region. When a dimension hits a zero you stop growth. When the algorithm stops you note the size and move to a next seed that has not been in any of the growing areas. Stop when all seeds have been tested. Note the highest area. – Gelliant Aug 03 '17 at 07:59
  • And try not to use eval in your code, it is evil and should be avoided at all cost. https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice – Gelliant Aug 03 '17 at 08:02
  • Understood Gelliant. Seed & grow approach sounds work efficiently for me. I have heard that eval worsen the efficient of the code so much. But I knew not enough functions or commands for me to complete the related part. If it does not bother you, may I have any opinions on how to replace the "eval"? – Leba Aug 04 '17 at 09:40
  • have you seen my second answer below? there I show a seed & grow approach without using eval. – Gelliant Aug 04 '17 at 18:59
  • Oh sorry, I meant if the eval in my code can be eliminated or not. But your code in another approach works better and without eval, that question does not matter actually. – Leba Aug 06 '17 at 03:21
1

Second try: Start with a seed point and try to grow the matrix in all dimensions. The result is the start and finish point in the matrix.

function [ res ] = seed_grow( A )
if ~islogical(A),A=(A==1);end
if ~any(A(:)),res={};end
go = true;
dims=size(A);
ind = cell([1 length(dims)]);  %cell to store find results
seeds=A;maxmat=0;
while go %main loop to remove all posible seeds
    [ind{:}]=find(seeds,1,'first');
    S = [ind{:}]; %the seed
    St = [ind{:}]; %the end of the seed
    go2=true;
    val_dims=1:length(dims);
    while go2 %loop to grow each dimension
        D=1;
        while D<=length(val_dims) %add one to each dimension
            St(val_dims(D))=St(val_dims(D))+1;
            I={};
            for ct = 1:length(S),I{ct}=S(ct):St(ct);end %generate indices
            if St(val_dims(D))>dims(val_dims(D))
                res=false;%outside matrix
            else
                res=A(I{:});
            end 
            if ~all(res(:)) %invalid addition to dimension
                St(val_dims(D))=St(val_dims(D))-1; %undo
                val_dims(D)=[]; D=D-1; %do not try again
                if isempty(val_dims),go2=false;end %end of growth
            end
            D=D+1;
        end
    end
    %evaluate the result
    mat = prod((St+1)-S); %size of matrix
    if mat>maxmat
        res={S,St};
        maxmat=mat;
    end
    %tried to expand, now remove seed option
    for ct = 1:length(S),I{ct}=S(ct):St(ct);end %generate indices
    seeds(I{:})=0;    
    if ~any(seeds),go=0;end
end
end

I tested it using your matrix:

A =  [0     0     0     1     1     0
 0     1     1     0     1     1
 1     0     1     1     1     1
 1     0     0     1     1     1
 0     1     1     0     1     1
 0     1     0     0     1     1
 1     0     0     0     1     1
 1     0     0     0     0     0];
[ res ] = seed_grow( A );
for ct = 1:length(res),I{ct}=res{1}(ct):res{2}(ct);end %generate indices
B=A(I{:});
idx = reshape(1:numel(A),size(A)); 
idx = idx(I{:});

And got the desired result:

B =

     1     1
     1     1
     1     1
     1     1
     1     1
     1     1


idx =

    34    42
    35    43
    36    44
    37    45
    38    46
    39    47
Gelliant
  • 1,835
  • 1
  • 11
  • 23