5

I'm trying to implement an algorithm to get all combinations of k elements out of a set of n elements where the difference between two consecutive combinations are maximized (so kind of reverse Gray codes). In other words, the combinations should be ordered to avoid elements from appearing twice in a row, and so that no element is unnecessarily discriminated.

Ideally, the algorithm would also NOT pre-calculate all combinations and store them into memory, but rather deliver combinations on demand. I have searched extensively for this and found a few detailed answers such as https://stackoverflow.com/a/127856/1226020, but I can't seem to apply this. Also, many of the articles linked in that answer are paid content.

To illustrate what I mean:

From a set of [0, 1, 2, 3, 4], find all combinations of two elements. Using a simple algorithm that tries to increment the right-most element until no longer possible, then moving left, incrementing the previous digit etc, I get the following results:

[0, 1]
[0, 2]
[0, 3]
[0, 4]
[1, 2]
[1, 3]
[1, 4]
[2, 3]
[2, 4]
[3, 4]

I produce this result with the following Java code:

public class CombinationGenerator {
    private final int mNrElements;
    private final int[] mCurrentCombination;

    public CombinationGenerator(int n, int k) {
        mNrElements = n;
        mCurrentCombination = new int[k];

        initElements(0, 0);
        // fake initial state in order not to miss first combination below
        mCurrentCombination[mCurrentCombination.length - 1]--;
    }

    private void initElements(int startPos, int startValue) {
        for (int i = startPos; i < mCurrentCombination.length; i++) {
            mCurrentCombination[i] = i + startValue - startPos;
        }
    }

    public int[] getNextCombination() {
        for (int i = 0; i < mCurrentCombination.length; i++) {
            int pos = mCurrentCombination.length - 1 - i;

            if (mCurrentCombination[pos] < mNrElements - 1 - i) {
                initElements(pos, mCurrentCombination[pos] + 1);
                return mCurrentCombination;
            }
        }

        return null;
    }

    public static void main(String[] args) {
        CombinationGenerator cg = new CombinationGenerator(5, 2);
        int[] c;

        while ((c = cg.getNextCombination()) != null) {
            System.out.println(Arrays.toString(c));
        }
    }

}

This is not what I want, because I want each consecutive combination to be as different as possible from the previous one. Currently, element "1" appears four times in a row, and then never again. For this particular example, one solution would be:

[0, 1]
[2, 3]
[0, 4]
[1, 2]
[3, 4]
[0, 2]
[1, 3]
[2, 4]
[0, 3]
[1, 4]

I have indeed managed to accomplish this result for this particular (7,4) case by applying a sorting algoritm after the combinations are generated, but this does not fulfill my requirement of on-demand combination generation as the entire set of combinations has to be generated at once, then sorted and kept in memory. I'm not sure it works for arbitrary k and n values either. And finally, I'm pretty sure it's not the most efficient way as the sorting algorithm basically loops through the set of combinations trying to find one sharing no elements with the previous combination. I also considered keeping a table of "hit counts" for each element and use that to always get the next combination containing the lowest combined hit count. My somewhat empirical conclusion is that it will be possible to avoid elements from completely appearing in two consecutive combinations if n > 2k. Otherwise, it should at least be possible to avoid elements from appearing more than twice in a row etc.

You could compare this problem to what is achieved for k = 2 using a standard round-robin scheme for football games etc, but I need a solution for arbitrary values of k. We can imagine this to be a tournament of some sort of game where we have n players that are to play against all other players in a set of games, each game holding k players. Players should as far as possible not have to play two games in a row, but should also not have to wait unnecessarily long between two game appearances.

Any pointers on how to solve this either with a reliable sorting algorithm post generation, or - preferably - on-demand, would be awesome!

Note: Let's typically assume that n <= 50, k <= 5

Thanks

Community
  • 1
  • 1
JHH
  • 8,567
  • 8
  • 47
  • 91
  • 1
    I suspect that running the Angluin--Valiant local search algorithm for Hamiltonian cycle on the combinations graph would be effective for small parameter settings, if somewhat brutal in resource usage. – David Eisenstat Mar 10 '15 at 14:20
  • @DavidEisenstat how do you guarantee that there are no "close" edges in the final output? If I understand the problem correctly, the desired goal is to have all adjacent combinations separated by *at least* X positions - making many of them very different while having some very close would not be considered good. – tucuxi Mar 10 '15 at 14:51
  • 1
    @tucuxi The combinations graph has edges wherever it's OK for the combinations to appear back to back in the output. – David Eisenstat Mar 10 '15 at 14:52
  • How can we know the maximum minimum distance that is achievable given a particular n and k? An upper limit is min(k, n-k), but I am not sure that it is always achievable. If it is, then @DavidEisenstat's idea of building a graph with admissible edges and finding a hamiltonian path looks good (if expensive). – tucuxi Mar 10 '15 at 15:01
  • Thanks a lot guys. I have to admit it's been a while since I studied combinatorics, so I probably have to put in quite an effort to grasp this. Thanks a lot for giving me some pointers! – JHH Mar 10 '15 at 21:04

2 Answers2

2

Quick & dirty working code working on @DavidEisenstat's suggestion:

public static void main(String[] args) {
    ArrayList<int[]> all = new ArrayList<int[]>();
    // output is 0 if distance(i, j) != max, and 1 otherwise
    int[][] m = buildGraph(7, 4, all);
    HamiltonianCycle hc = new HamiltonianCycle();
    int path[] = hc.findHamiltonianCycle(m);
    if (path != null) {
        // I have no proof that such a path will always exist
        for (int i : path) {
            System.out.println(Arrays.toString(all.get(i)));
        }
    }
}

Output for above code (7,4); distance (as length - size_of_intersection) is always 3; trying to use 4 would lead to a disconnected graph:

    [0, 1, 2, 3]
    [0, 4, 5, 6]
    [1, 2, 3, 4]
    [0, 1, 5, 6]
    [0, 2, 3, 4]
    [1, 2, 5, 6]
    [0, 1, 3, 4]
    [0, 2, 5, 6]
    [1, 3, 4, 5]
    [0, 1, 2, 6]
    [0, 3, 4, 5]
    [1, 2, 3, 6]
    [0, 1, 4, 5]
    [0, 2, 3, 6]
    [1, 4, 5, 6]
    [0, 2, 3, 5]
    [1, 2, 4, 6]
    [0, 3, 5, 6]
    [1, 2, 4, 5]
    [0, 3, 4, 6]
    [1, 2, 3, 5]
    [0, 2, 4, 6]
    [1, 3, 5, 6]
    [0, 2, 4, 5]
    [1, 3, 4, 6]
    [0, 1, 2, 5]
    [2, 3, 4, 6]
    [0, 1, 3, 5]
    [2, 4, 5, 6]
    [0, 1, 3, 6]
    [2, 3, 4, 5]
    [0, 1, 4, 6]
    [2, 3, 5, 6]
    [0, 1, 2, 4]
    [3, 4, 5, 6]

Missing bits of code:

// uses JHH's code to build sequences, stores it in 'all'
public static int[][] buildGraph(int n, int k, ArrayList<int[]> all) {
    SequenceGenerator sg = new SequenceGenerator(n, k);
    int[] c;
    while ((c = sg.getNextCombination()) != null) {
        all.add(c.clone());         
    }
    int best = Math.min(n-k, k);
    System.out.println("Best is " + best);
    int matrix[][] = new int[all.size()][];
    for (int i=0; i<matrix.length; i++) {
        matrix[i] = new int[all.size()];
        for (int j=0; j<i; j++) {
            int d = distance(all.get(j), all.get(i));
            matrix[i][j] = matrix[j][i] = (d != best)? 0 : 1;
        }           
    }
    return matrix;
}

Distance: (not efficient at all, but dwarfed by cost of hamiltonian calculation)

public static int distance(int[] a, int[] b) {
        HashSet<Integer> ha = new HashSet<Integer>();
        HashSet<Integer> hb = new HashSet<Integer>();
        for (int i=0; i<a.length; i++) {
                ha.add(a[i]);
                hb.add(b[i]);
        }
        ha.retainAll(hb);
        return a.length - ha.size();
}

And for finding the hamiltonian, I modified code from http://www.sanfoundry.com/java-program-find-hamiltonian-cycle-unweighted-graph/:

import java.util.Arrays;

public class HamiltonianCycle {

    private int V, pathCount;
    private int[] path;
    private int[] answer;
    private int[][] graph;

    public int[] findHamiltonianCycle(int[][] g) {
        V = g.length;
        path = new int[V];

        Arrays.fill(path, -1);
        graph = g;
        path[0] = 0;
        pathCount = 1;
        if (solve(0)) {
            return path;
        } else {
            return null;
        }
    }

    public boolean solve(int vertex) {
        if (graph[vertex][0] == 1 && pathCount == V) {
            return true;
        }
        if (pathCount == V) {
            return false;
        }

        for (int v = 0; v < V; v++) {
            if (graph[vertex][v] == 1) {
                path[pathCount++] = v;
                graph[vertex][v] = 0;
                graph[v][vertex] = 0;

                if (!isPresent(v)) {
                    if (solve(v)) {
                        answer = path.clone();
                        return true;
                    }
                }

                graph[vertex][v] = 1;
                graph[v][vertex] = 1;
                path[--pathCount] = -1;
            }
        }
        return false;
    }

    public boolean isPresent(int v) {
        for (int i = 0; i < pathCount - 1; i++) {
            if (path[i] == v) {
                return true;
            }
        }
        return false;
    }
}

Be warned: this will be very slow for large numbers of combinations...

tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • Thanks for your extensive answer! However, when looking at your (7,4) example results I have to ask if this really fulfils the requirement of having each combination as different as possible compared to the previous one? I would expect something like [0, 1, 2, 3] followed by [0, 4, 5, 6], [1, 2, 3, 4], [5, 6, 0, 1], ... Did I misunderstand? – JHH Mar 10 '15 at 21:10
  • To visualize, let's say this was a tournament of some sort of game where in each round 4 players face each-other, and we want the schedule to be as fair as possible when it comes to consecutive games and waiting time in between games. In your example, player 2 would start by playing 8 games in succession, while player 6 would have to wait for three rounds before getting to play at all. I imagine it would be possible to find an ordering where at most 1 element is shared between two consecutive combinations? – JHH Mar 10 '15 at 21:12
  • After changing distance calculation (now included in answer), it now generates your suggested sequence. – tucuxi Mar 10 '15 at 22:31
  • Thanks again @tucuxi, really appreciate your efforts. For (7, 3) this indeed seems to produce the desired results (although combinations are not produced on demand, but that was an optional bonus requirement :)). But I do get some unexpected results when changing n and k. For (4,2) I don't get any output at all (solve() returns false) and for (6,2) I do indeed get a decent variation of elements in the sense that elements are not present in two consecutive combinations, but nevertheless, some elements are "discriminated", for example element 5 does not appear in any of the 6 first combinations. – JHH Mar 11 '15 at 14:10
  • I do feel that I should look more closely at your solution in order to understand the underlying theory before I experiment more though. :) – JHH Mar 11 '15 at 14:11
  • Basic explanation: connect dots (=valid 7,4 combinations) only if they are different enough so that placing them adjacent is ok. Then, find a path that connects all dots in single path without ever using the same dot twice (= a Hamiltonian cycle). – tucuxi Mar 11 '15 at 14:34
1

While I really appreciate the efforts from @tucuxi and @David Eisenstadt, I found some issues with the Hamiltonian approach, namely that it didn't solve for certain values of n and k and certain elements were also unnecessarily discriminated.

I decided to give the ideas listed in my question a go (hit count table) and it seems to give pretty good results. This solution however also requires a post-generation sorting which does not fulfill the on-demand bonus requirement. For reasonable n and k this should be feasible though.

Admittedly, I've found that my algorithm sometimes seems to favour combinations that result in consecutive appearances for one element, which can probably be tuned. But as of now, my algorithm might be inferior to tucuxi's for (7,4) specifically. It does however present a solution for every n, k and seems to discriminate elements less.

My code is listed below.

Thanks again.

public class CombinationGenerator {
    private final int N;
    private final int K;
    private final int[] mCurrentCombination;

    public CombinationGenerator(int n, int k) {
        N = n;
        K = k;
        mCurrentCombination = new int[k];

        setElementSequence(0, 0);
        mCurrentCombination[K - 1]--; // fool the first iteration
    }

    private void setElementSequence(int startPos, int startValue) {
        for (int i = startPos; i < K; i++) {
            mCurrentCombination[i] = i + startValue - startPos;
        }
    }

    public int[] getNextCombination() {
        for (int i = K - 1; i >= 0; i--) {
            if (mCurrentCombination[i] < i + N - K) {
                setElementSequence(i, mCurrentCombination[i] + 1);
                return mCurrentCombination.clone();
            }
        }

        return null;
    }   
}

public class CombinationSorter {
    private final int N;
    private final int K;

    public CombinationSorter(int n, int k) {
        N = n;
        K = k;
    }

    public List<int[]> getSortedCombinations(List<int[]> combinations) {
        List<int[]> sortedCombinations = new LinkedList<int[]>();
        int[] combination = null;
        int[] hitCounts = new int[N]; // how many times each element has been
                                      // used so far

        // Note that this modifies (empties) the input list
        while (!combinations.isEmpty()) {
            int index = findNextCombination(combinations, hitCounts, combination);
            combination = combinations.remove(index);
            sortedCombinations.add(combination);

            addHitCounts(combination, hitCounts);
        }

        return sortedCombinations;
    }

    private int findNextCombination(List<int[]> combinations, int[] hitCounts,
            int[] previousCombination) {
        int lowestHitCount = Integer.MAX_VALUE;
        int foundIndex = 0;

        // From the remaining combinations, find the one with the least used
        // elements
        for (int i = 0; i < combinations.size(); i++) {
            int[] comb = combinations.get(i);
            int hitCount = getHitCount(comb, hitCounts);

            if (hitCount < lowestHitCount) {
                lowestHitCount = hitCount;
                foundIndex = i;
            } else if (hitCount == lowestHitCount
                    && previousCombination != null
                    && getNrCommonElements(comb, previousCombination) < getNrCommonElements(
                            combinations.get(foundIndex), previousCombination)) {
                // prefer this combination if hit count is equal but it's more
                // different from the previous combination in our sorted list
                // than what's been found so far (avoids consecutive element
                // appearances)
                foundIndex = i;
            }
        }

        return foundIndex;
    }

    private int getHitCount(int[] combination, int[] hitCounts) {
        int hitCount = 0;

        for (int i = 0; i < K; i++) {
            hitCount += hitCounts[combination[i]];
        }

        return hitCount;
    }

    private void addHitCounts(int[] combination, int[] hitCounts) {
        for (int i = 0; i < K; i++) {
            hitCounts[combination[i]]++;
        }
    }

    private int getNrCommonElements(int[] c1, int[] c2) {
        int count = 0;

        for (int i = 0; i < K; i++) {
            for (int j = 0; j < K; j++) {
                if (c1[i] == c2[j]) {
                    count++;
                }
            }
        }
        return count;
    }
}

public class Test {
    public static void main(String[] args) {
        final int n = 7;
        final int k = 3;

        CombinationGenerator cg = new CombinationGenerator(n, k);
        List<int[]> combinations = new LinkedList<int[]>();
        int[] nc;

        while ((nc = cg.getNextCombination()) != null) {
            combinations.add(nc);
        }

        for (int[] c : combinations) {
            System.out.println(Arrays.toString(c));
        }

        System.out.println("Sorting...");

        CombinationSorter cs = new CombinationSorter(n, k);
        List<int[]> sortedCombinations = cs.getSortedCombinations(combinations);

        for (int[] sc : sortedCombinations) {
            System.out.println(Arrays.toString(sc));
        }
    }

}

Results for (7,4):

[0, 1, 2, 3]
[0, 4, 5, 6]
[1, 2, 3, 4]
[0, 1, 5, 6]
[2, 3, 4, 5]
[0, 1, 2, 6]
[3, 4, 5, 6]
[0, 1, 2, 4]
[0, 3, 5, 6]
[1, 2, 4, 5]
[0, 1, 3, 6]
[2, 4, 5, 6]
[0, 1, 3, 4]
[2, 3, 5, 6]
[0, 1, 4, 5]
[0, 2, 3, 6]
[1, 3, 4, 5]
[0, 2, 4, 6]
[1, 2, 3, 5]
[0, 1, 4, 6]
[0, 2, 3, 5]
[1, 2, 4, 6]
[1, 3, 5, 6]
[0, 2, 3, 4]
[1, 2, 5, 6]
[0, 3, 4, 5]
[1, 2, 3, 6]
[0, 2, 4, 5]
[1, 3, 4, 6]
[0, 2, 5, 6]
[0, 1, 3, 5]
[2, 3, 4, 6]
[1, 4, 5, 6]
[0, 1, 2, 5]
[0, 3, 4, 6]

Results for (5,2):

[0, 1]
[2, 3]
[0, 4]
[1, 2]
[3, 4]
[0, 2]
[1, 3]
[2, 4]
[0, 3]
[1, 4]
JHH
  • 8,567
  • 8
  • 47
  • 91