0

I have a set of 120 pairs, which are derived from a set of 16 items (i.e. c(16,2)=120).

I want to determine how many combinations of 56 pairs can be chosen from the 120 pairs, but with the constraint that each combination has to contain exactly 7 of each of the 16 items (i.e. in each subset of 56 pairs, each of the 16 items is equally represented).

In addition to determining the number of combinations, I also need to list them. Can anyone help with how I could code this in Matlab please?

Frazz
  • 2,995
  • 2
  • 19
  • 33
  • 1
    You need to tag this as Matlab if you want Matlab experts to look at it. I also suggest you add what you have tried to do yourself, otherwise the question will be seen as "give me the code". Look through the Help center and see how you can post better questions and have a better chance of getting good answers ;) – Frazz Aug 27 '14 at 06:53
  • "unclear what you're asking", guys are you serious? The question is clearly formulated and a really interesting combinatorial problem! – matheburg Aug 27 '14 at 14:15

1 Answers1

0

It is an interesting combinitorial question since nchoosek(120,56) gives 7.4149e+34 possible ways to select 56 pairs out of the 120. So you shouldn't solve this by a loop over all selections.

To handle the problem I introduce a representation of selected pairs given by a matrix A = (aij), where aij == 1 if and only if (i,j) oder (j,i) is a pair among the 56 selected ones. This matrix represents a selection of 56 pairs as described above if and only if row sums are 7, i.e. A*ones(16,1) = 7*ones(16,1) (symmetric!). It is easy to solve this systematically to print all possible solutions, but this results in about 1e20 candidates...

But according to this high number you can easily select one example randomly by some stupid code like the following:

%% find some random example
A = zeros(16,16);
b1 = ones(16,1);
b7 = 7*b1;

while norm(A*b1 - b7) ~= 0
    A = zeros(16,16);
    change = true;
    while norm(A*b1 - b7) ~= 0 && change
        change = false;
        todo = find(A*b1<7);
        idx = todo(randperm(length(todo)));
        for k = 1:2:length(idx)-1
            if A(idx(k),idx(k+1)) == 0
                change = true;
            end
            A(idx(k),idx(k+1)) = 1;
            A(idx(k+1),idx(k)) = 1;
        end        
    end    
end

%% print it
pIdx = 0;
for lIdx = 1:16
    for cIdx = 1:lIdx-1
        if A(lIdx,cIdx) == 1
            fprintf(['(' num2str(cIdx) ',' num2str(lIdx) ')  ']);
            pIdx = pIdx + 1;
            if mod(pIdx,7) == 0
                fprintf('\n');
            end            
        end
    end
end

Example result:

(1,3)  (2,4)  (1,5)  (2,5)  (2,6)  (3,6)  (4,6)  
(5,6)  (1,7)  (3,7)  (6,7)  (4,8)  (5,8)  (2,9)  
(5,9)  (6,9)  (7,9)  (8,9)  (2,10)  (3,10)  (4,10)  
(5,10)  (7,10)  (1,11)  (3,11)  (6,11)  (7,11)  (9,11)  
(1,12)  (4,12)  (7,12)  (11,12)  (3,13)  (8,13)  (9,13)  
(10,13)  (1,14)  (2,14)  (4,14)  (8,14)  (12,14)  (13,14)  
(1,15)  (3,15)  (5,15)  (8,15)  (11,15)  (12,15)  (13,15)  
(2,16)  (4,16)  (8,16)  (10,16)  (12,16)  (13,16)  (14,16)

Update: Added code to scan all possible solutions...

function main()
    %% provide n-choose-k sequences (compute only once!)
    global ncks maxResult;
    disp('Prepare n choose k sequences...');
    for n = 1:15
        fprintf([num2str(n) '...  ']);
        for k = 1:min(7,n)
            ncks{n}{k} = nchoosek_sequence(n,k);
        end
    end
    fprintf('\n');

    %% start finding solutions
    disp('Scan space of solutions...')
    A = zeros(16,16);
    maxResult = -Inf;
    tic;
    findMats(A,1)
    toc;
end

function findMats(A,curRow)
    global ncks maxResult;
    remain = 7-sum(A+A');
    if curRow == 16
        if remain(16) == 0
            A = A + A';
            if norm(sum(A)-7*ones(1,16)) ~= 0
                error('AHHH');
            end
            %A
            %pause();
            maxResult = max(YOUR_FUNCTION(A),maxResult); % <- add your function here
        end
        return;
    end
    if remain(curRow) == 0
        findMats(A,curRow+1);
    elseif remain(curRow) > 0
        remainingRows = curRow + find(remain(curRow+1:end) > 0);
        if ~isempty(remainingRows) && length(remainingRows) >= remain(curRow)
            for r = 1:nchoosek(length(remainingRows),remain(curRow))
                row = remainingRows(ncks{length(remainingRows)}{remain(curRow)}(r,:));
                Anew = A;
                Anew(curRow,row) = 1;
                findMats(Anew,curRow+1);
            end
        end
    end
end

% very simple and not optimized way to get a sequence of subset-selections
% in increasing order
%
% Note: This function was easy to implement and sufficient for our
% purposes. For more efficient implementations follow e.g.
%  http://stackoverflow.com/questions/127704/algorithm-to-return-all-combin
%  ations-of-k-elements-from-n
function s = nchoosek_sequence(N,k)
    s = zeros(nchoosek(N,k),k);
    idx = 1;
    for n=(2^k-1):(2^N-1)
        if sum(dec2bin(n)=='1') == k
            temp = dec2bin(n);
            s(idx,:) = find([zeros(1,N-length(temp)), temp=='1']);
            idx = idx + 1;
        end
    end    

    % just for convinience ;-)
    s = s(end:-1:1,:);
end

But this can take a while... ;-)

matheburg
  • 2,097
  • 1
  • 19
  • 46
  • Thanks @matheburg, that is extremely helpful. My next challenge is a simple sum maximization. Each pair actually corresponds to a value and I need to find the combination of 56 pairs which maximizes the sum of these values. Obviously, this is easy enough with a manageable set of combinations, but with 10e+20 combinations things are not so straightforward. I don't actually need the maximum sum value defined exactly, just some sense of what it is. Therefore, I am contemplating looping your code and approximating the maximum based on this simulation approach? Any thoughts? Thanks! – Jordi McKenzie Aug 29 '14 at 06:08
  • To just get a sense I would loop the code (as you said) and save EACH result. Then you can plot the distribution and get an idea whether this is a sufficient approach or whether you should run over all candidates. I am not exactly sure how many solutions exist, but the most simple complete iteration algorithm I can think of would have to run over ~1e18 candidates. This could be possible, so if required we can go on with discussion ;-) Btw: How expensive is your function you evaluate in each iteration? – matheburg Aug 29 '14 at 06:24
  • I first need to replace each of the 56 pairs with a single (scalar) value which I have for each of 120 pairs (something like a 'vlookup' in excel I guess). Then I need to sum over all the 56 values to generate the 'sum value' as I refer to it above. So in terms of computing, very simple. I should probably also mention, I am not a programmer/mathematician but an economist so this is all rather new territory for me and your help is gratefully appreciated (I will certainly credit your help on our paper too... which I can do to your pseudonym or real name if you'd prefer). Thanks again!:) – Jordi McKenzie Aug 29 '14 at 07:06
  • This would be an honour, I will send you a private message. I have added code to scan the full space of solutions as well. BUT: This can take forever (I am currently running it and it gave me 1.3Mio solutions already). So if it takes too long you should iterate the random one and watch the distribution to get an idea about the maximum. Btw: If you run the code you should add some output that gives you a sign whether your run is still in progress ;-) – matheburg Aug 29 '14 at 10:23