1

Consider a list of permutations (order-relevant combinations) of the form :

(1 2 3)
(1 2 4)
(5 2 3)
(5 2 4)

I need to find the smallest number of generating sets for this permutation group. For example, given the permutations above,

(1 2 3,4)
(5 2 3,4)

Is not an optimal solution. The optimal solution is (1,5 2 3,4).

You will notice that this solution contains sets A={1, 5} B={2} and C={3,4} The original list of permutations is the ordered Cartesian product of these sets: A X B X C.

I would like to partition the list of permutations into the fewest possible groups expressed as sets A, B, and C whose product contains all permutations in the list. The final answer is usually, but not always, a list of a list of sets, since it is not always possible to reduce the list of permutations down to a single list of generating sets. That is, it is usually the case that the product of sets A, B, and C account for some of the permutations in the list, but sets D, E, and F are required to account for other permutations in the list and so on.

My crude attempt at solving this problem involved checking to see if I had a match on any of the two permutation slots in the list and recursively merging them. If so, I merged those two combinations, i.e.

(1 2 3)
(1 2 4)

produce

(1 2 3,4)

Unfortunately, the merge order of such combinations is not associative. An ideal solution would merge these combinations in such a way that the final list of sets would subsume the largest possible number of permutations.

To demonstrate the problem with associativity, take this example, which cannot reduce to fewer than two lists of generating sets:

(1 2 4) 
(1 2 3)
(1 2 5) 
(5 2 3) 
(5 2 4)

Suppose you were to recursively merge these according to the following rule, “If any two columns match, I merge by preserving those columns as-is. I then merge the two sets in the third column to produce the new set. After the merger of these two rows, I throw out the two original rows, so they are not re-merged or double-counted.” The order of these merges is significant. Given the list of permutations above, if I merge (1 2 3) and (1 2 4), I get (1 2 3,4). Now, how do I conduct the next merge to optimize the generating set? Suppose I look at (1 2 5) and see that it matches (1 2 3,4) on two columns, I perform the merge and get (1 2 3,4,5). All appears to be well. However, I then merge (5 2 3) and (5 2 4) which results in (5 2 3,4). I compare (5 2 3,4) and (1 2 3,4,5). I do not have a two-column match, so I stop merging. I would have reduced the output to (5 2 3,4) and (1 2 3,4,5).

Now suppose that I merged in a different order. I merge (1 2 3) and (1 2 4) to produce (1 2 3,4). Then I merge (5 2 3) and (5 2 4) to produce (5 2 3,4). I see that I have a match between these two products. I then merge (1 2 3,4) and (5 2 3,4) to produce (1,5 2 3,4). The final list of generating sets is (1,5 2 3,4) and (1 2 5). So you see that the merge order has produced two different answers: (1,5 2 3,4) and (1 2 5) vs. (5 2 3,4) and (1 2 3,4,5).

In this case I would probably settle for either answer, since the same number (2) of lists of generating sets occurs in each answer. However, (1,5 2 3,4) and (1 2 5) is slightly preferable, because (1,5 2 3,4) subsumes the largest possible number of combinations. Suppose, however, that we have a list of 900 combinations. The merge order of a bottom-up solution to the problem will cause you to reduce the problem in a non-optimal way where optimization is the smallest possible count on the list of the list of sets. It's hard to know what the merge order is without looking ahead through every possible merge path, which is not any more efficient than the brute force method of finding matches, which I have also tried.

I have also attempted the brute force method. Why is the efficiency of the brute force method unacceptable? That method first finds a list of unique integers across all columns. It then generates the power set of every possible combination of these integers. It does so for column/set A, column/set B, and column/set C. It then orders these sets by size (biggest to smallest), computes the Cartesian product of each set across every other set in other columns, then it loops through these Cartesian products, which are keyed by their generating sets to check if the Cartesian product matches a list of permutations from the input. This is roughly O(n^3) where n=the number of combinations in the input list. If I could do this in O(n^2) even, it would be a win compared to where it is now.

As far as the memory considerations go. Let’s assume that the domain for each slot is the integers 1-12. The number of distinct combinations across three slots is 12!/3! You’re looking at over 79 million raw combinations. That’s before they are even partitioned into sets by Google’s Guava Collection APIs (which I’d highly recommend BTW). We could probably somehow generate the set lazily, and I get the sense that Google does, but the memory constraints are still large.

I'm really looking for an algorithm to solve this problem. However, if there's a Java library or method that will take care of this with minimal pain, I'm up for that too. Thanks.

foundation
  • 11
  • 3
  • I'm not sure I understand what you mean by generating sets. Can you elaborate? – templatetypedef Jan 02 '14 at 20:02
  • 1
    You should also check out http://cs.stackexchange.com -- from the about section `Ask about... algorithms, models of computation` – C.B. Jan 02 '14 at 20:02
  • By generating sets, I mean that if you take each element from the first slot’s set then match it with each element in the second slot’s set then match that with each element in the third slot’s set you would have produced every combination in the list. In effect, I’m looking for something like a reverse Cartesian product calculator. It may take more than one Cartesian product to account for every combination in the list and in fact usually does. – foundation Jan 02 '14 at 20:20
  • It may be better to refer to this as a generating list, because it is NOT a generating set 123 whose permutations are `123 132 213 231 312 321` As I’ve formulated the problem, (1 2 3) would produce one permutation only, namely (1 2 3) – foundation Jan 02 '14 at 20:20
  • Thanks, C.B. I'll check it out. :) – foundation Jan 02 '14 at 20:38
  • What would be the generating list for this set of permutations? `(1 2 4) (1 2 3) (5 2 3) (5 2 4)` I'm curious about your issue with non-associativity. – Robert Fischer Jan 03 '14 at 15:31
  • I have edited the question above to explain the associativity issue in greater detail, since it's pretty detailed. (1,5 2 3,4) would generate the list above. Its order of merges is irrelevant. If I merge (1 2 3) and (5 2 3) to (1,5 2 3), then (1 2 4) and (5 2 4) to (1,5 2 4), then (1,5 2 4) and (1,5 2 3) to end with (1,5 2 3,4), I get the same answer as if I merged in a different order. However, this is not always the case, as you can see from my edits above. – foundation Jan 03 '14 at 16:32
  • Knowing which order of processing will reduce to the optimally smallest list smells like an NP problem to me. When you say "efficient", what do you mean? What are we trying to optimize? Also, are we fixed to lists of size 3? (My inclination is to sort by popularity, with the most popular lists being merged in first, leaving the unpopular lists as stragglers.) – Robert Fischer Jan 03 '14 at 16:41
  • Welcome to SO. Nice question. – drankin2112 Jan 03 '14 at 17:37
  • The list length is fixed to three. I have the same problem for four slots, but it is the same in every other way. I’ve edited the question a little to explain the efficiency problems with memory and computational time more. What do you mean by sort by popularity? I also tried representing these as a tree. The first column was at the parent level of the tree; the third consisted purely of leaves. Unfortunately, it unbalances the tree a little bit. You start to get stuff like (1 2 3,4,5,6,7) and your chances of merging (1 2 3,4,5,6,7) with other subtrees is unlikely. – foundation Jan 03 '14 at 17:54
  • So then I started researching matrices, tries, and quite a few other data structures in the hopes that I might find a better match. Still no joy. I feel that some set theorist or graph theorist must have solved this, but, alas, I wasn’t able to glean much from StackOverflow or MathOverflow. – foundation Jan 03 '14 at 17:55
  • Thanks, @drankin2112. This one has left me profoundly stumped. I'm astonished that I haven't been able to find an answer on Google. It seems like a general enough problem. – foundation Jan 03 '14 at 18:39
  • Your question might seem less confusing if you didn't use the word [permutation](https://en.wikipedia.org/wiki/Permutation), since in mathematics it doesn't actually mean what you seem to be using it to mean. (Also, _if_ I'm understanding your question right, it seems similar to [optimal rectangle decomposition](http://stackoverflow.com/questions/5919298/algorithm-for-finding-the-fewest-rectangles-to-cover-a-set-of-rectangles), except in three (or four) dimensions instead of two.) – Ilmari Karonen Jan 03 '14 at 20:05
  • I'll post an answer for you in a little bit. Please like it if it helps:) – drankin2112 Jan 03 '14 at 20:39
  • Ilmari, the list of permutations is a list of integers in "an arrangement of those objects into a particular order." In fact, some of these integer lists start as a user permuting (1 2 3), then removing combinations. Thanks for the link. If this were represented as a graph, it would be a directed graph, which I'm not sure the rectangle decomposition accounts for? That is, (1 2 3) is not a set whose order is irrelevant. I can move 1->2->3, but not in the opposite order. Perhaps if the intersection matrix could somehow consider directionality? – foundation Jan 03 '14 at 20:53
  • If you don't mind me asking, what kind of software are you writing with this? – drankin2112 Jan 04 '14 at 02:15

1 Answers1

0

Thanks everyone for your insights and answers. I'd like to credit this answer to John Kolen (http://www.johnfkolen.com) who has presented a feasible solution as follows:

A Greedy Algorithm for Maximal Coverage of Triples

Input: A set, with subsets A x B x C, of triples, where A, B, and C are sets of integers.

Output: A set of n triples (A_i, B_i, C_i), where A_i in A, B_i in B, and C_i in C, and Union_i A_i x B_i x C_i = X and Intersection_i A_i x B_i x C_i = empty.

Algorithm

The algorithm can be broken into three parts: finding pair covers, finding triple covers, and finally constructing a total cover.

Finding covers of pairs from a subset of B x C involves building a map from elements of B to sets of C. Essentially a small prefix tree, or trie, this data structure makes it easy to find the maximal cover a set of prefixes. For instance, if b_1->C_1 and b_2->C_2, then the maximal cover involving b_1 and b_2 will be <[b_1,b_2],C_1 intersection C_2>.

Once we can find maximal pairs, then it's possible to construct maximal triples. This time, however, the prefix (A) maps to a set of pairs (BxC). By using the previous method it's possible to find examine all subsets of A and their matching pair coverage over the suffixes.

The greedy approach finds the locally best cover and adds it to the solution set. The triples captured by the current cover are removed from consideration, and the process continues until the locally best cover is a singleton. The remaining triples are added to the solution.

The relevant code is attached.

import java.io.IOException;
import java.io.FileReader;
import java.io.BufferedReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;
import java.util.Arrays;

class Triples {
    private ArrayList<int[]> _data;
    private HashMap<Integer,HashSet<Integer>> _tree;

    /**
     * Instantiates a container of triples read from the provided file.
     * <p>
     * File should contain lines of the form '[#,#,#]', where # is an
     * integer. Spaces are ignored.
     *
     * @param filename a path to a file containing triples
     */
    public Triples(String filename) {
    _data = new ArrayList<int[]>();
    _tree = new HashMap<Integer, HashSet<Integer>>();
    readfile(filename);
    }

    /**
     * Returns the number of triples.
     *
     * @return the number of triples.
     */
    int size() {
    return _data.size();
    }

    /**
     * Returns a set of encoded pairs (b,c) where (first, b, c) is in the
     * set of triples. The pairs are encoded as integers using b * 255 + c.
     *
     * @param  first  the first value of a triple
     * @return        a set of pair encondig
     */
    HashSet<Integer> tree(int first) {
    return _tree.get(first);
    }

    public class PairCover 
    {
    public ArrayList<Integer> a;
    public ArrayList<Integer> b;
    PairCover() 
    {
        a = b = null;
    }
    PairCover(ArrayList<Integer> ax, ArrayList<Integer> bx) 
    {
        a = ax;
        b = bx;
    }
    /**
     * Returns the number of pairs covered by this cover.
     *
     * @return the number of pairs covered by this cover.
     */
    public int score()
    {
        if (a != null && b != null)
        return a.size() * b.size();
        else
        return 0;
    }
    public String toString() {
        return "<"+a+","+b+">";
    }
    }

    /**
     * Compute the maximal pair covering (B,C) for a set of suffix pairs.
     *
     * @return pair covering where |BxC| is maximized.
     */
    public PairCover maxPairCover(HashSet<Integer> set) {
    // unpack the pairs
    HashMap<Integer,NSet> pairs = new HashMap<Integer,NSet>();
    for(Integer value : set) {
        Integer a = value / 256;
        Integer b = value & 255;
        if (!pairs.containsKey(a))
        pairs.put(a, new NSet());
        pairs.get(a).add(b);
    }

    _mpc_best = new PairCover();
    Integer[] remain_ary = pairs.keySet().toArray(new Integer[0]);

    // recursively examine all subsets pair first values and find matching
    // second value groups.
    maxPairCoverRec(pairs,
            new ArrayList<Integer>(),
            new ArrayList<Integer>(Arrays.asList(remain_ary)),
            null);

    return _mpc_best;
    }

    /**
     * Recursively compute the maximal pair covering (B,C) for a set of 
     * suffix pairs from a set of active prefixes by combining all possible
     * subsets of the remaining prefixes.
     * <p>
     * Stores the best pair cover in _mpc_best. This "global" variable makes it
     * easy to prune search.
     *
     * @param pairs tree-compressed set of pairs
     * @param active currently active pair prefixes
     * @param remain set of pair prefixes to explore
     * @param set the pair suffixes shared by all the active prefixes
     * @return pair covering where |BxC| is maximized.
     */
    void maxPairCoverRec(HashMap<Integer,NSet> pairs,
                 ArrayList<Integer> active,
                 ArrayList<Integer> remain,
                 NSet set) 
    {
    if (set != null) {
        // Prune search if suffix set is empty or that there is no way
        // to generate a better pair cover than the current cover.
        if (set.isEmpty() ||
        (active.size() + remain.size()) * set.size()
        <= _mpc_best.score())
        return ;
    }
    if (remain.isEmpty()) {
        if (active.size() > 0) {
        // Save the current pair cover if it's better than the current
        // cover.
        if (_mpc_best.score() < set.size() * active.size()) {
            _mpc_best.a = (ArrayList<Integer>)(active.clone());
            _mpc_best.b = (ArrayList<Integer>)(set.to_array_list());
        }
        }
        return;
    }

    ArrayList<Integer> active_r = (ArrayList<Integer>)(active.clone());
    ArrayList<Integer> remain_r = (ArrayList<Integer>)(remain.clone());
    Integer a = remain_r.remove(0);
    // Explore active sets without element a.
    maxPairCoverRec(pairs, active_r, remain_r, set);

    // Explore active sets with element a. Perform intersection of 
    // current suffix set with suffixes of a.
    if (set == null) {
        set = pairs.get(a);
    }
    else {
        set = set.intersection(pairs.get(a));
    }
    active_r.add(a);
    maxPairCoverRec(pairs, active_r, remain_r, set);
    }

    public class TripleCover 
    {
    public ArrayList<Integer> a;
    public ArrayList<Integer> b;
    public ArrayList<Integer> c;
    TripleCover() 
    {
        a = b = c = null;
    }
    TripleCover(ArrayList<Integer> ax,
            ArrayList<Integer> bx,
            ArrayList<Integer> cx) 
    {
        a = ax;
        b = bx;
        c = cx;
    }
    TripleCover(int ax,
            int bx,
            int cx) 
    {
        a = new ArrayList<Integer>();
        a.add(ax);
        b = new ArrayList<Integer>();
        b.add(bx);
        c = new ArrayList<Integer>();
        c.add(cx);
    }
    /**
     * Returns the number of pairs covered by this cover.
     *
     * @return the number of pairs covered by this cover.
     */
    public int score()
    {
        if (a != null && b != null && c != null)
        return a.size() * b.size() * c.size();
        else
        return 0;
    }
    public String toString() {
        return "<"+a+","+b+","+c+">";
    }
    }

    /**
     * Compute the maximal triple cover for the pairs currently stored
     * in _tree.
     *
     * @return a triple cover with |AxBxC| maximized
     */
    TripleCover maxTripleCover() {
    _mtc_best = new TripleCover();
    Integer[] remain_ary = _tree.keySet().toArray(new Integer[0]);
    ArrayList<Integer> remain = 
        new ArrayList<Integer>(Arrays.asList(remain_ary));
    maxTripleCoverRec(new ArrayList<Integer>(),
                 remain,
                 null);
    return _mtc_best;
    }

    /**
     * Recursively explore all subsets of values in first position and
     * find the largest cover for the second and third positions.
     * <p>
     * Stores the best triple cover in _mtc_best. This "global" variable 
     * makes it easy to prune search.
     *
     * @param active the active values of the first position
     * @param remain the additional values of the first position to explore
     * @param set the current set of second/third position pairs that are shared by the active values
     */
    void maxTripleCoverRec(ArrayList<Integer> active,
               ArrayList<Integer> remain,
               HashSet<Integer> set) {
    if (set != null &&
        (set.isEmpty() ||
         (remain.size()>0 && 
          (active.size() + remain.size()) * set.size()
          <= _mtc_best.score()))){
        return;
    }
    if (remain.isEmpty()) {
        if (active.size() > 0) {
        PairCover mpc = maxPairCover(set);
        if (_mtc_best.score() < mpc.score() * active.size()) {
            _mtc_best.a = (ArrayList<Integer>)(active.clone());
            _mtc_best.b = mpc.a;
            _mtc_best.c = mpc.b;
        }
        }
        return;
    }

    ArrayList<Integer> active_r = (ArrayList<Integer>)(active.clone());
    ArrayList<Integer> remain_r = (ArrayList<Integer>)(remain.clone());
    Integer a = remain_r.remove(0);
    maxTripleCoverRec(active_r, remain_r, set);
    if (set == null) {
        set = (HashSet<Integer>)(_tree.get(a).clone());
    }
    else {
        set = (HashSet<Integer>)(set.clone());
        set.retainAll(_tree.get(a));
    }
    active_r.add(a);
    maxTripleCoverRec(active_r, remain_r, set);

    }

    /**
     * Finds a covering set for the triples using a greedy approach.
     * The largest cover of the current triples is found, recorded, and
     * the affected triples are removed from consideration. This process
     * continues until singleton covers are left.
     *
     * @return a list of covers
     */
    ArrayList<TripleCover> greedy() {
    // Since the prefix tree is modified as new covers are found,
    // we make a copy
    HashMap<Integer,HashSet<Integer>> old_tree = _tree;
    _tree = (HashMap<Integer,HashSet<Integer>>)(_tree.clone());

    ArrayList<TripleCover> solution = new ArrayList<TripleCover>();
    TripleCover cover;
    do {
        cover = maxTripleCover();  // find best
        solution.add(cover);  // record it
        remove(cover);  // remove covered triples from tree
        System.out.println("" + cover + " ==> " + cover.score());
    } while (cover.score() > 1);

    // Nothing left but singletons, so add them to solution
    for(Integer a : _tree.keySet()) {
        for(Integer pair : _tree.get(a)) {
        int b = pair / 256;
        int c = pair & 255;
        solution.add(new TripleCover(a,b,c));
        }
    }

    // restore prefix tree
    _tree = old_tree; 

    return solution;
    }

    //************************************************************

    private PairCover _mpc_best;
    private TripleCover _mtc_best;


    /**
     * Reads triples from the specified file. Will exit if file does not
     * exist.
     *
     * @param filename a path to a file containing triples
     */
    private void readfile(String filename) {
    try {
        FileReader fr = new FileReader(filename);
        BufferedReader reader = new BufferedReader(fr);
        String line = null;
        while ((line = reader.readLine()) != null) {
        line = line.replace("[","").replace("]","").replace(" ","");
        String[] svalues = line.split(",");
        int[] values = new int[3];
        values[0] = Integer.parseInt(svalues[0]);
        values[1] = Integer.parseInt(svalues[1]);
        values[2] = Integer.parseInt(svalues[2]);
        _data.add(values);
        Integer key = new Integer(values[0]);
        if (!_tree.containsKey(key)) {
            _tree.put(key, new HashSet<Integer>());
        }
        _tree.get(key).add(values[1] * 256 + values[2]);
        }
    } catch (IOException e) {
        System.out.println("Could not open " + filename);
        System.exit(1);
    }
    }

    /**
     * Removes covered triples from the prefix tree
     *
     * @param tc a covering
     */
    private void remove(TripleCover tc) {
    for(Integer a : tc.a) {
        HashSet<Integer> set = _tree.get(a);
        for(Integer b : tc.b) {
        for(Integer c : tc.c) {
            set.remove(b * 256 + c);
        }
        }
        if (set.isEmpty()) {
        _tree.remove(a);
        }
    }
    }

}
foundation
  • 11
  • 3