Fastest solution for all possible combinations, taking k elements out of n possible with k>2 and n large

Question

I am using MATLAB to find all of the possible combinations of k elements out of n possible elements. I stumbled across this question, but unfortunately it does not solve my problem. Of course, neither does nchoosek as my n is around 100.

Truth is, I don't need all of the possible combinations at the same time. I will explain what I need, as there might be an easier way to achieve the desired result. I have a matrix M of 100 rows and 25 columns.

Think of a submatrix of M as a matrix formed by ALL columns of M and only a subset of the rows. I have a function f that can be applied to any matrix which gives a result of either -1 or 1. For example, you can think of the function as sign(det(A)) where A is any matrix (the exact function is irrelevant for this part of the question).

I want to know what is the biggest number of rows of M for which the submatrix A formed by these rows is such that f(A) = 1. Notice that if f(M) = 1, I am done. However, if this is not the case then I need to start combining rows, starting of all combinations with 99 rows, then taking the ones with 98 rows, and so on.

Up to this point, my implementation had to do with nchoosek which worked when M had only a few rows. However, now that I am working with a relatively bigger dataset, things get stuck. Do any of you guys think of a way to implement this without having to use the above function? Any help would be gladly appreciated.

Here is my minimal working example, it works for small obs_tot but fails when I try to use bigger numbers:

value = -1; obs_tot = 100; n_rows = 25;
mat = randi(obs_tot,n_rows);
while value == -1
    posibles = nchoosek(1:obs_tot,i);
    [num_tries,num_obs] = size(possibles);
    num_try = 1;
        while value == 0 && num_try <= num_tries
        check = mat(possibles(num_try,:),:);
        value = sign(det(check));
        num_try = num_try + 1;
        end
    i = i - 1;
end
obs_used = possibles(num_try-1,:)';

So, you said that you got som problems with `nchoosek` for larger number of rows. I guess that you understand as well as us that it is because the number of combiations get larger. However I would like to ask here, how many `i` do you expect that you need to test here? I have tried to do some worst case estimation of number of combinations and that is for selecting 50 rows out of 100. The number of possible combinations can be calculated as 100!/(50!*50!) which gives a maximum number of combinations of about 1.0*10^29. A loop from 1 to 1.0*10^29 is quite heavy I am afraid. — patrik, Mar 19 '15 at 07:45
Yes, I understand that is the issue. My prior was that maybe 10-20 iterations tops would suffice, but I cannot be sure of this. What I am thinking now is that there could be some good guestimate. For example, sampling one observation and seeing whether the condition holds. Then sampling another observation, adding it and checking it again, if my condition fails then sample another observation and try again and so on. This can be done many times without the process exploding. — MathUser, Mar 19 '15 at 09:02
So this means then that you do not want all possible combinations of `k` rows from 100 possible then? Or am I getting this the wrong way now? — patrik, Mar 19 '15 at 09:12
@patrik: his 4th paragraph explained what he wants. @MathUser: certainly the thinking of using `nchoosek` is not possible since your `n` is too big even though it's the most clearly way. Maybe you can try (i.e. analytic?) to simplify your calculation/reduce `n`/limit `k`, or thinking about something more `adaptive` ^_^ — scmg, Mar 19 '15 at 09:47
Don't get your hopes up for getting a real alternative to `nchoosek`. There are implementations of `nchoosek` that are a bit faster, but your resulting matrices will be so big that this won't matter. If this can't be simplified mathematically you should go with a heuristic. I assume you have already verified that your problem is not solvable by a simple greedy approach? Why not just randomly generate combinations starting from size `k=1` and increase `k` with every found `f(A)=1`?(Like with `comb=randperm(n,k)`.) This way you at least have a slight chance of finding a large `k`. — knedlsepp, Mar 19 '15 at 11:06
@scmg what OP writes in paragraph 4 does rapidly become unsolvable due to computational complexity. Already at k=10 the problem have 10^13 possible combinations, which means that looping through them all is not an option. However, OP indicated that he have some way that can reduce number of combinations. However, it is hard to give a proper answer unless I know how this is done. — patrik, Mar 19 '15 at 13:44
I will try to answer all of your questions. The logical condition that I am trying to prove is whether there is a row vector *x* such that it has all non-negative coordinates and that when multiplied by the submatrix *A* gives *xA=0*. What I am trying to do is to find the biggest submatrix such that this condition does not hold and I already have an algorithm to test the condition. There are indeed properties on the matrix *M* that may help me simplify the problem. I believe that a random approach certainly may work. — MathUser, Mar 20 '15 at 04:31

CitizenInsane · Accepted Answer · 2018-04-29T06:12:52.893

Preamble

As yourself noticed in your question, it would be nice not to have nchoosek to return all possible combinations at the same time but rather to enumerate them one by one in order not to explode memory when n becomes large. So something like:

enumerator = CombinationEnumerator(k, n);
while(enumerator.MoveNext())
    currentCombination = enumerator.Current;
    ...
end

Here is an implementation of such enumerator as a Matlab class. It is based on classic IEnumerator<T> interface in C# / .NET and mimics the subfunction combs in nchoosek (the unrolled way):

%
% PURPOSE:
%
%   Enumerates all combinations of length 'k' in a set of length 'n'.
%
% USAGE:
%
%   enumerator = CombinaisonEnumerator(k, n);
%   while(enumerator.MoveNext())
%       currentCombination = enumerator.Current;
%       ...
%   end
%

%% ---
classdef CombinaisonEnumerator  < handle

    properties (Dependent) % NB: Matlab R2013b bug => Dependent must be declared before their get/set !
        Current; % Gets the current element.
    end

    methods
        function [enumerator] = CombinaisonEnumerator(k, n)
        % Creates a new combinations enumerator.

            if (~isscalar(n) || (n < 1) || (~isreal(n)) || (n ~= round(n))), error('`n` must be a scalar positive integer.'); end
            if (~isscalar(k) || (k < 0) || (~isreal(k)) || (k ~= round(k))), error('`k` must be a scalar positive or null integer.'); end
            if (k > n), error('`k` must be less or equal than `n`'); end

            enumerator.k = k;
            enumerator.n = n;
            enumerator.v = 1:n;            
            enumerator.Reset();

        end
        function [b] = MoveNext(enumerator)
        % Advances the enumerator to the next element of the collection.

            if (~enumerator.isOkNext), 
                b = false; return; 
            end

            if (enumerator.isInVoid)
                if (enumerator.k == enumerator.n),
                    enumerator.isInVoid = false;
                    enumerator.current = enumerator.v;
                elseif (enumerator.k == 1)
                    enumerator.isInVoid = false;
                    enumerator.index = 1;
                    enumerator.current = enumerator.v(enumerator.index);                    
                else
                    enumerator.isInVoid = false;
                    enumerator.index = 1;
                    enumerator.recursion = CombinaisonEnumerator(enumerator.k - 1, enumerator.n - enumerator.index);
                    enumerator.recursion.v = enumerator.v((enumerator.index + 1):end); % adapt v (todo: should use private constructor)
                    enumerator.recursion.MoveNext();
                    enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current]; 
                end
            else
                if (enumerator.k == enumerator.n),
                    enumerator.isInVoid = true;
                    enumerator.isOkNext = false;
                elseif (enumerator.k == 1)
                    enumerator.index = enumerator.index + 1;
                    if (enumerator.index <= enumerator.n)
                        enumerator.current = enumerator.v(enumerator.index);
                    else 
                        enumerator.isInVoid = true;
                        enumerator.isOkNext = false;
                    end                                       
                else
                    if (enumerator.recursion.MoveNext())
                        enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
                    else
                        enumerator.index = enumerator.index + 1;
                        if (enumerator.index <= (enumerator.n - enumerator.k + 1))
                            enumerator.recursion = CombinaisonEnumerator(enumerator.k - 1, enumerator.n - enumerator.index);
                            enumerator.recursion.v = enumerator.v((enumerator.index + 1):end); % adapt v (todo: should use private constructor)
                            enumerator.recursion.MoveNext();
                            enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
                        else 
                            enumerator.isInVoid = true;
                            enumerator.isOkNext = false;
                        end                                                               
                    end
                end
            end

            b = enumerator.isOkNext;

        end
        function [] = Reset(enumerator)
        % Sets the enumerator to its initial position, which is before the first element.

            enumerator.isInVoid = true;
            enumerator.isOkNext = (enumerator.k > 0);

        end
        function [c] = get.Current(enumerator)
            if (enumerator.isInVoid), error('Enumerator is positioned (before/after) the (first/last) element.'); end
            c = enumerator.current;
        end
    end

    properties (GetAccess=private, SetAccess=private)
        k = [];
        n = [];
        v = [];
        index = [];
        recursion = [];
        current = [];
        isOkNext = false;
        isInVoid = true;
    end

end

We can test implementation is ok from command window like this:

>> e = CombinaisonEnumerator(3, 6);
>> while(e.MoveNext()), fprintf(1, '%s\n', num2str(e.Current)); end

Which returns as expected the following n!/(k!*(n-k)!) combinations:

Implementation of this enumerator may be further optimized for speed, or by enumerating combinations in an order more appropriate for your case (e.g., test some combinations first rather than others) ... Well, at least it works! :)

Problem solving

Now solving your problem is really easy:

n = 100;
m = 25;
matrix = rand(n, m);

k = n;
cont = true;
while(cont && (k >= 1))

    e = CombinationEnumerator(k, n);
    while(cont && e.MoveNext());

       cont = f(matrix(e.Current(:), :)) ~= 1;

    end

    if (cont), k = k - 1; end 

end

This is an absolutely amazing workaround for larger Ns and I'm surprised it doesn't have more upvotes. One question though. I'm trying to implement this with your own example. In your problem solving section the line `cont = f(matrix(e.Current(:), :) ~= 1;` is missing a parentheses, preventing me from running the code. I tried adding it basically everywhere but nowhere works. I'm not exactly sure what `f()` means in this case and don't want to break this. Can you clarify what that line is supposed to do or where the parenthesis goes (I'm so ashamed to request this clarification). — chainhomelow, Apr 28 '18 at 23:49
@chainhomelow Thanks for your up-vote and information about missing parenthesis (i edited my post to fix this). The example is pseudo-code only to show how to use my enumerator in that particular case. Function `f` is undefined (you have to replace it by whatever function you want). @MathUser in his question only specified that it was returning `-1` or `1` and that code should stop when it returns `1`. — CitizenInsane, Apr 29 '18 at 06:23
Ahhh thanks for the clarification. I have not worked with user-built classes before and was afraid it has something to do with its structure. Appreciate your time. — chainhomelow, Apr 29 '18 at 18:27

Fastest solution for all possible combinations, taking k elements out of n possible with k>2 and n large

1 Answers1