2


I need to calculate all permutations of a collection and i have a code for that but the problem is that it is linear and takes a lot of time.

public static <E> Set<Set<E>> getAllCombinations(Collection<E> inputSet) {
    List<E> input = new ArrayList<>(inputSet);
    Set<Set<E>> ret = new HashSet<>();
    int len = inputSet.size();
    // run over all numbers between 1 and 2^length (one number per subset). each bit represents an object
    // include the object in the set if the corresponding bit is 1
    for (int i = (1 << len) - 1; i > 0; i--) {
        Set<E> comb = new HashSet<>();
        for (int j = 0; j < len; j++) {
            if ((i & 1 << j) != 0) {
                comb.add(input.get(j));
            }
        }
        ret.add(comb);
    }
    return ret;
}

I am trying to make the computation run in parallel.

I though of the option to writing the logic using recursion and then parallel execute the recursion call but i am not exactly sure how to do that.

Would appreciate any help.

Eli Skoran
  • 238
  • 1
  • 10
  • I'm not sure parallel computation would help you unless you have a massive multi-core processor. But if you can use Java8 Stream as return value instead of Set that can help because Sreams create next element on demand. – pcjuzer Dec 08 '16 at 14:15
  • The [Streamplify](https://github.com/beryx/streamplify) library provides parallel streams for permutations, combinations, cartesian products etc. Take a look at the implementation or use it directly in your code. – siordache Dec 08 '16 at 14:24

1 Answers1

4

There is no need to use recursion, in fact, that might be counter-productive. Since the creation of each combination can be performed independently of the others, it can be done using parallel Streams. Note that you don’t even need to perform the bit manipulations by hand:

public static <E> Set<Set<E>> getAllCombinations(Collection<E> inputSet) {
    // use inputSet.stream().distinct().collect(Collectors.toList());
    // to get only distinct combinations
    //  (in case source contains duplicates, i.e. is not a Set)
    List<E> input = new ArrayList<>(inputSet);
    final int size = input.size();
    // sort out input that is too large. In fact, even lower numbers might
    // be way too large. But using <63 bits allows to use long values
    if(size>=63) throw new OutOfMemoryError("not enough memory for "
        +BigInteger.ONE.shiftLeft(input.size()).subtract(BigInteger.ONE)+" permutations");

    // the actual operation is quite compact when using the Stream API
    return LongStream.range(1, 1L<<size) /* .parallel() */
        .mapToObj(l -> BitSet.valueOf(new long[] {l}).stream()
            .mapToObj(input::get).collect(Collectors.toSet()))
        .collect(Collectors.toSet());
}

The inner stream operation, i.e. iterating over the bits, is too small to benefit from parallel operations, especially as it would have to merge the result into a single Set. But if the number of combinations to produce is sufficiently large, running the outer stream in parallel will already utilize all CPU cores.

The alternative is not to use a parallel stream, but to return the Stream<Set<E>> itself instead of collecting into a Set<Set<E>>, to allow the caller to chain the consuming operation directly.

By the way, hashing an entire Set (or lots of them) can be quite expensive, so the cost of the final merging step(s) are likely to dominate the performance. Returning a List<Set<E>> instead can dramatically increase the performance. The same applies to the alternative of returning a Stream<Set<E>> without collecting the combinations at all, as this also works without hashing the Sets.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • took me a while to get, but it's very nice. – Eugene Dec 08 '16 at 15:32
  • shouldn't this *if(size>63)* be a comparison against 64? it's not like we care about sign here. Also, I sort disagree throwing an OutOfMemory for this purpose. – Eugene Dec 08 '16 at 20:54
  • @Eugene: LOL, it actually should be `62` (or `>=63`), as `1L<<63` is already negative. It doesn’t matter for the bit extraction, but `LongStream.range` does care about the sign. Trying to support `≥63` elements complicates the implementation without any benefit, as we still aren’t able to hold the combinations in memory (which is why `OutOfMemoryError` is appropriate). Note that even if each combination was as small as an `int`, a 64 bit JVM can’t have enough memory to hold 2⁶² combinations *in principle*. But we’re talking about `Set`s. If we didn’t reject before, OOME is what would happen. – Holger Dec 09 '16 at 08:20
  • @Eugene: if you return a `Stream` instead, you can go the road of [this answer](http://stackoverflow.com/a/40632965/2711488) which simply cheats and processes only the first 2⁶² possible combinations. Since even if each combination was processed within one nanosecond, processing all of them still took 146 years (and it will actually be much slower), no-one will ever notice. (You could do a correct implementation handling more cases, e.g. using `BigInteger`, but you won’t even be able to test it in your lifetime). – Holger Dec 09 '16 at 08:24
  • I think that LOL was for the part about "who cares?" beyond 30 (or whatever any number) it will take forever for this to be done, right? With everything else I do agree. thx a lot for the follow-up. – Eugene Dec 09 '16 at 14:54
  • @Eugene: it was, because you asked whether the `63` should be a `64` and my response turned out to be that it should be `62`. It’s like you asked “shouldn’t that blue be green” and I answer with “you’re right, it should be red”. Indeed, these numbers don’t matter much. – Holger Dec 09 '16 at 15:01