2

Let's say we have a set with n elements. I want to iterate efficiently over all subsets with k elements, such that during iteration, the elements are selected roughly equally.


Example

We have 7 elements: [a,b,c,d,e,f,g] and we want all 4-subsets. There are overall binom(7)(4) subsets.

  1. iteration: [a,b,c,d]
  2. iteration: [e,f,g,a] (now each element was selected at least once)
  3. iteration: [b,c,d,e]
  4. iteration: [f,g,a,b] (now each element was selected at least twice)
  5. iteration: ...

first 21 iterations of my first attempt

This only works, if n is prime. It does not work with n = 6 and k = 3.


I need an algorithm to get ALL possible subsets without generating them beforehand (for binom(30)(15) we already have over 150 million subsets).

(I want to implement this with javascript, but any suggestion is welcome.)

alexvoedi
  • 43
  • 1
  • 4

2 Answers2

2

If n = 2k, then there's a simple solution. Enumerate all k-subsets containing 1, following each by its complement.

If n ≠ 2k but k divides n, then I think you need the construction underlying Baranyai's theorem to keep all numbers within one occurrence of each other. This is not an ideal solution, since it requires both a complicated algorithm and keeping the entire list in memory at once.

I have no idea how to do the other cases off hand (though due to the symmetry of the problem, an NP-hardness reduction (for, e.g., given n and k and g, determine whether it's possible to keep the max difference to at most g) seems extremely unlikely to be forthcoming). Perhaps you could describe your fallback requirements.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
1

This answer is efficient but not optimally precise.

There exists an efficient algorithm to unrank combinations, i.e., given n and k and an integer j ∈ {0, ..., (n choose k) − 1}, return the jth combination in lexicographic order.

Compose that algorithm with a format-preserving encryption scheme that randomly permutes {0, ..., (n choose k) − 1} and you get an algorithm that approximately balances the frequency of each element. I think I'd use the FFX mode of AES to get a permutation of {0, ..., m−1} where m is the least power of two not less than n choose k and then do cycle-walking, but someone more knowledgeable than me about crypto might have a better idea.

To control the variance, I'd generalize this algorithm to keep a pool containing a fixed number of combinations, then alternately (1) greedily take the combination from the pool that was the least unfair (2) add the next combination to the pool.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120