4

Everything is in the title .. =)

I want to create a powerset fastly, only with subset of length between 3 (minimum constant length) and a maximum. This maximum should be 5 most of the times but I want it variable (from 3 to 5,6). The original set may be large. I need to store theses subset for further processing.

I've seen Obtaining a powerset of a set in Java, but here they generate every subset of the power set. I've also seen efficient powerset algorithm for subsets of minimal length , for C#, but I think ,as Adam S said, I will run into low running time performances and memory issues.

I'd like to combine theses ideas into an ideal one for my goal. My last hope would be to generate every subset fastly ( maybe with the algorithm in Guava ) and to iterate over to take only with desired length...but even to read it's ugly ;)

Anyone got ideas ?

Community
  • 1
  • 1
Nikkolasg
  • 444
  • 4
  • 18

3 Answers3

4

I borrowed heavily from st0le's answer.

Basically, as I iterated through the bit array that was controlling the set generation, I checked to make sure that the number of bits was between the minimum and maximum.

Here's the code.

import java.util.BitSet;
import java.util.Iterator;
import java.util.Set;
import java.util.TreeSet;

public class PowerSet<E> implements Iterator<Set<E>>, Iterable<Set<E>> {
    private int     minSize;
    private int     maxSize;
    private E[]     arr     = null;
    private BitSet  bset    = null;

    @SuppressWarnings("unchecked")
    public PowerSet(Set<E> set, int minSize, int maxSize) {
        this.minSize = Math.min(minSize, set.size());
        this.maxSize = Math.min(maxSize, set.size());

        arr = (E[]) set.toArray();
        bset = new BitSet(arr.length + 1);

        for (int i = 0; i < minSize; i++) {
            bset.set(i);
        }
    }

    @Override
    public boolean hasNext() {
        return !bset.get(arr.length);
    }

    @Override
    public Set<E> next() {
        Set<E> returnSet = new TreeSet<E>();
        // System.out.println(printBitSet());
        for (int i = 0; i < arr.length; i++) {
            if (bset.get(i)) {
                returnSet.add(arr[i]);
            }
        }

        int count;
        do {
            incrementBitSet();
            count = countBitSet();
        } while ((count < minSize) || (count > maxSize));

        // System.out.println(returnSet);
        return returnSet;
    }

    protected void incrementBitSet() {
        for (int i = 0; i < bset.size(); i++) {
            if (!bset.get(i)) {
                bset.set(i);
                break;
            } else
                bset.clear(i);
        }
    }

    protected int countBitSet() {
        int count = 0;
        for (int i = 0; i < bset.size(); i++) {
            if (bset.get(i)) {
                count++;
            }
        }
        return count;

    }

    protected String printBitSet() {
        StringBuilder builder = new StringBuilder();
        for (int i = 0; i < bset.size(); i++) {
            if (bset.get(i)) {
                builder.append('1');
            } else {
                builder.append('0');
            }
        }
        return builder.toString();
    }

    @Override
    public void remove() {
        throw new UnsupportedOperationException("Not Supported!");
    }

    @Override
    public Iterator<Set<E>> iterator() {
        return this;
    }

    public static void main(String[] args) {
        Set<Character> set = new TreeSet<Character>();
        for (int i = 0; i < 7; i++)
            set.add((char) (i + 'A'));

        PowerSet<Character> pset = new PowerSet<Character>(set, 3, 5);
        int count = 1;
        for (Set<Character> s : pset) {
            System.out.println(count++ + ": " + s);
        }
    }

}

And here are the test results:

1: [A, B, C]
2: [A, B, D]
3: [A, C, D]
4: [B, C, D]
5: [A, B, C, D]
6: [A, B, E]
7: [A, C, E]
8: [B, C, E]
9: [A, B, C, E]
10: [A, D, E]
11: [B, D, E]
12: [A, B, D, E]
13: [C, D, E]
14: [A, C, D, E]
15: [B, C, D, E]
16: [A, B, C, D, E]
17: [A, B, F]
18: [A, C, F]
19: [B, C, F]
20: [A, B, C, F]
21: [A, D, F]
22: [B, D, F]
23: [A, B, D, F]
24: [C, D, F]
25: [A, C, D, F]
26: [B, C, D, F]
27: [A, B, C, D, F]
28: [A, E, F]
29: [B, E, F]
30: [A, B, E, F]
31: [C, E, F]
32: [A, C, E, F]
33: [B, C, E, F]
34: [A, B, C, E, F]
35: [D, E, F]
36: [A, D, E, F]
37: [B, D, E, F]
38: [A, B, D, E, F]
39: [C, D, E, F]
40: [A, C, D, E, F]
41: [B, C, D, E, F]
42: [A, B, G]
43: [A, C, G]
44: [B, C, G]
45: [A, B, C, G]
46: [A, D, G]
47: [B, D, G]
48: [A, B, D, G]
49: [C, D, G]
50: [A, C, D, G]
51: [B, C, D, G]
52: [A, B, C, D, G]
53: [A, E, G]
54: [B, E, G]
55: [A, B, E, G]
56: [C, E, G]
57: [A, C, E, G]
58: [B, C, E, G]
59: [A, B, C, E, G]
60: [D, E, G]
61: [A, D, E, G]
62: [B, D, E, G]
63: [A, B, D, E, G]
64: [C, D, E, G]
65: [A, C, D, E, G]
66: [B, C, D, E, G]
67: [A, F, G]
68: [B, F, G]
69: [A, B, F, G]
70: [C, F, G]
71: [A, C, F, G]
72: [B, C, F, G]
73: [A, B, C, F, G]
74: [D, F, G]
75: [A, D, F, G]
76: [B, D, F, G]
77: [A, B, D, F, G]
78: [C, D, F, G]
79: [A, C, D, F, G]
80: [B, C, D, F, G]
81: [E, F, G]
82: [A, E, F, G]
83: [B, E, F, G]
84: [A, B, E, F, G]
85: [C, E, F, G]
86: [A, C, E, F, G]
87: [B, C, E, F, G]
88: [D, E, F, G]
89: [A, D, E, F, G]
90: [B, D, E, F, G]
91: [C, D, E, F, G]
Community
  • 1
  • 1
Gilbert Le Blanc
  • 50,182
  • 6
  • 67
  • 111
  • great !! c'est genial ;) Thanks a lot. For now it works very well with small number, i'll try it with bigger =) – Nikkolasg Mar 19 '13 at 15:30
  • it is very slow for a set of size 49 with minSize = 30 (more than 10 seconds for first entry...) – avianey Aug 21 '16 at 12:09
1

Please specify if the number of elements are bounded.

For 3 - 5,6 elements an array of indices is feasible, maybe short[6].

For instance with upto 32 element an int could hold the set, and one could either:

// dumb:
for (int mask = 0; ; ;) {
    int cardinality = Integer.bitCount(mask);
    if (3 <= cardinality && cardinality <= 6) {
        ...
    }
}

For more than 64 elements still maybe BitSet would offer compactness.

The decision here, is bound by easy statistics: how many elements and so on. For the int/long/BitSet solution one could start with a BitSet as this has a nicer API. For instance one can go to the next powerset with equal number of bits by flipping two bits. If one is mathematically inclined, De Bruijn cycles might be interesting.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • I haven't look further into this solution because i wanted a simple solution first. Now I'm looking for a really fast solution. Would it be possible to use BigInteger from java ? because, there is no known bound on the number of elements. because the method incrementBitSet() is a really slow operation and with integer could just be simplified as += 1 ... In fact, I will try to implement it and give feedback. – Nikkolasg Apr 23 '13 at 18:56
  • Pb : biginteger does not have a bitCount method which can return the number of set bit in his representation. I've search for other implementations but didn t find anything useful due to the lack of this function. – Nikkolasg Apr 23 '13 at 19:29
  • `bitCount` generally is something like `int bitCount(int n) { return n == 0? 0 : 1 + bitCount(n & (n - 1)); }` but BigInteger does have a bitCount I thought. – Joop Eggen Apr 24 '13 at 08:02
  • But in your code above, Integer.bitCount returns the number of "1" bits, whereas in biginteger, it returns the number of bit differing from sign bit. Not the same thing, and i want the first one,which is not implemented in biginteger. – Nikkolasg Apr 24 '13 at 11:36
  • You are right, but for positive BigIntegers bitCount is okay. – Joop Eggen Apr 24 '13 at 12:55
0

For Sets whose size do not exceed 63 elements and if you have a minimal size constraint that is not low, this implementation might be faster :

import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;

import static com.google.common.base.Preconditions.checkArgument;
import static java.lang.Math.min;

public class PowerSet<E> implements Iterator<Iterable<E>>, Iterable<Iterable<E>> {

    private int minSize;
    private int maxSize;
    private E[] arr = null;
    private long bits = 0;
    private long count = 0;
    private long minMask = 0;

    /**
     * Build a PowerSet of the given {@link Set} to iterate over subsets whose size is between the given boundaries
     * @param set the set to create subsets from
     * @param minSize the minimal size of the subsets
     * @param maxSize the maximum size of the subsets
     */
    @SuppressWarnings("unchecked")
    public PowerSet(Set<E> set, int minSize, int maxSize) {
        checkArgument(maxSize < 63); // limited to 63 because we need one additional bit for hasNext
        this.minSize = min(minSize, set.size());
        this.maxSize = min(maxSize, set.size());
        arr = (E[]) set.toArray();
        for (int n = 0; n < minSize; n++) {
            bits |= (1L << n);
        }
        count = countBitSet(bits);
    }

    @Override
    public boolean hasNext() {
        return (bits & (1L << arr.length)) == 0;
    }

    @Override
    public Iterable<E> next() {
        // compute current subset
        final List<E> returnSet = new LinkedList<>();
        for (int i = 0; i < arr.length; i++) {
            if ((bits & (1L << i)) != 0) {
                returnSet.add(arr[i]);
            }
        }

        // compute next bit map for next subset
        do {
            if (count < minSize) {
                long maxFree = lowestIndex(bits) - 1;
                long missing = minSize - count;
                for (int n = 0; n < min(maxFree, missing); n++) {
                    bits |= (1L << n);
                }
            } else {
                bits++;
            }
            count = countBitSet(bits);
        } while ((count < minSize) || (count > maxSize));
        return returnSet;
    }

    /**
     * Lowest set bit in a long word
     * @param i the long word
     * @return lowest bit set
     */
    private static long lowestIndex(long i) {
        int n = 0;
        while (n < 64 && (i & 1L) == 0) {
            n++;
            i = i >>> 1;
        }
        return n;
    }

    /**
     * Compute the number of bit sets inside a word or a long word.<br/>
     * <a href="http://en.wikipedia.org/wiki/Hamming_weight">Hamming weight</a>
     * @param i the long word
     * @return number of set bits
     */
    private static long countBitSet(long i) {
        i = i - ((i >>> 1) & 0x5555555555555555L);
        i = (i & 0x3333333333333333L) + ((i >>> 2) & 0x3333333333333333L);
        i = ((i + (i >>> 4)) & 0x0F0F0F0F0F0F0F0FL);
        return (i * (0x0101010101010101L)) >>> 56;
    }

    @Override
    public void remove() {
        throw new UnsupportedOperationException("Not Supported!");
    }

    @Override
    public Iterator<Iterable<E>> iterator() {
        return this;
    }

}

It is backed by a long instead of a BitSet and compute next subset faster... It might also be faster for a minimal constraint that is low.

You can remove dependency over Guava that is used just for the constructor checkArgument...

avianey
  • 5,545
  • 3
  • 37
  • 60