6

Suppose I have a bag which contains 6 balls (3 white and 3 black). I want to find all possible subsets of a given length, disregarding the order. In the case above, there are only 4 combinations of 3 balls I can draw from the bag:

  • 2 white and 1 black
  • 2 black and 1 white
  • 3 white
  • 3 black

I already found a library in my language of choice that does exactly this, but I find it slow for greater numbers. For example, with a bag containing 15 white, 1 black, 1 blue, 1 red, 1 yellow and 1 green, there are only 32 combinations of 10 balls, but it takes 30 seconds to yield the result.

Is there an efficient algorithm which can find all those combinations that I could implement myself? Maybe this problem is not as trivial as I first thought...

Note: I'm not even sure of the right technic words to express this, so feel free to correct the title of my post.

Stamm
  • 947
  • 5
  • 17
  • 2
    I'm not sure I understand your initial problem: do you want to find *all* possible subsets of the multiset, or define certain ones? Otherwise, you could just as well draw 3 white and 2 black balls. – vinit_ivar Oct 03 '13 at 17:13
  • Yes, I want to find all possible outcomes. I'm updating the question. – Stamm Oct 03 '13 at 17:17
  • 1
    Your update still makes no sense. You say "there are only 4 combinations of balls I can draw from the bag". But this is not true. One combination you missed is "1 white and 1 black", but there are many others. – recursive Oct 03 '13 at 17:20
  • If you want to find all possible subsets of 6 balls, then you need to include subsets of length 0, 1, 2, 3, 4, 5 and 6. Do you want the possible combinations of a given length? – Keeran Brabazon Oct 03 '13 at 17:21
  • Yes, sorry, I want only combinations of a fixed length. I'm updating the question again. – Stamm Oct 03 '13 at 17:26
  • possible duplicate of [n choose k implementation](http://stackoverflow.com/questions/5095407/n-choose-k-implementation) – mbeckish Oct 03 '13 at 17:27
  • 2
    @mbeckish this is not the same as combinations with repetition are required, so is not just n choose k. Also, Stamm you may want to look at [combinations](http://en.wikipedia.org/wiki/Combination) as a starting point, if you are not familiar with them. You want combinations where repetition is allowed – Keeran Brabazon Oct 03 '13 at 17:28
  • 1
    @KeeranBrabazon - I don't see anything about repetition. – mbeckish Oct 03 '13 at 17:31
  • @mbeckish: We can have more than one ball of a certain colour in a subset. – abeln Oct 03 '13 at 17:32
  • @KeeranBrabazon - Nevermind, he just added "disregarding the order". Still, there are many duplicates on this topic. – mbeckish Oct 03 '13 at 17:32
  • @Keeran Brabazon - Yes, I've read the article as a starting point of my work but I'm struggling in finding an efficient algorithm. I've only come up with a naive one: recursively finding all permutations and removing duplicate ones, but it is very slow... – Stamm Oct 03 '13 at 17:35
  • @mbeckish - I didn't find any question that was exactly like mine. But feel free to point me to it if I missed it. – Stamm Oct 03 '13 at 17:37
  • I disagree with the vote to close. It's marked as a duplicate of a question regarding a general n choose k, but this is specifically about a multiset. – recursive Oct 03 '13 at 17:38
  • Umm, there are only 512 *possible* subsets of a {15;1;1;1;1;1} multiset. How could ***any*** computer code take 30 seconds to derive and filter that? I could beat that with a .BAT file. Heck, I could probably beat that by hand with an abacus. – RBarryYoung Oct 03 '13 at 17:43
  • @RBarryYoung: It's probably treating each of the 15 white balls as distinct elements, and filtering duplicates later, which gives over one million subsets before filtering. – recursive Oct 03 '13 at 17:46
  • @recursive You're probably right, but even then, 30 seconds to enumerate only 1 Million combinations is gruesomely slow. – RBarryYoung Oct 03 '13 at 17:49

2 Answers2

5

You can do significantly better than a general choose algorithm. The key insight is to treat each color of balls at the same time, rather than each of those balls one by one.

I created an un-optimized implementation of this algorithm in python that correctly finds the 32 result of your test case in milliseconds:

def multiset_choose(items_multiset, choose_items):
    if choose_items == 0:
        return 1 # always one way to choose zero items
    elif choose_items < 0:
        return 0 # always no ways to choose less than zero items
    elif not items_multiset:
        return 0 # always no ways to choose some items from a set of no items
    elif choose_items > sum(item[1] for item in items_multiset):
        return 0 # always no ways to choose more items than are in the multiset

    current_item_name, current_item_number = items_multiset[0]
    max_current_items = min([choose_items, current_item_number])

    return sum(
        multiset_choose(items_multiset[1:], choose_items - c)
        for c in range(0, max_current_items + 1)
    )

And the tests:

print multiset_choose([("white", 3), ("black", 3)], 3)
# output: 4

print multiset_choose([("white", 15), ("black", 1), ("blue", 1), ("red", 1), ("yellow", 1), ("green", 1)], 10)
# output: 32
recursive
  • 83,943
  • 34
  • 151
  • 241
  • I'll check your code as soon as I have access to my workstation. It seems interesting. I'm not familiar with Python: does it only count, or does it list all subsets? – Stamm Oct 03 '13 at 18:10
  • This algorithm gives only a count, but it wouldn't be too hard to make it list them all. – recursive Oct 03 '13 at 19:31
1

No, you don't need to search through all possible alternatives. A simple recursive algorithm (like the one given by @recursive) will give you the answer. If you are looking for a function that actually outputs all of the combinations, rather than just how many, here is a version written in R. I don't know what language you are working in, but it should be pretty straightforward to translate this into anything, although the code might be longer, since R is good at this kind of thing.

allCombos<-function(len, ## number of items to sample
                    x,   ## array of quantities of balls, by color
                    names=1:length(x)  ## names of the colors (defaults to "1","2",...)
){
  if(length(x)==0)
    return(c())

  r<-c()
  for(i in max(0,len-sum(x[-1])):min(x[1],len))
      r<-rbind(r,cbind(i,allCombos(len-i,x[-1])))

  colnames(r)<-names
  r
}

Here's the output:

> allCombos(3,c(3,3),c("white","black"))
     white black
[1,]     0     3
[2,]     1     2
[3,]     2     1
[4,]     3     0
> allCombos(10,c(15,1,1,1,1,1),c("white","black","blue","red","yellow","green"))
      white black blue red yellow green
 [1,]     5     1    1   1      1     1
 [2,]     6     0    1   1      1     1
 [3,]     6     1    0   1      1     1
 [4,]     6     1    1   0      1     1
 [5,]     6     1    1   1      0     1
 [6,]     6     1    1   1      1     0
 [7,]     7     0    0   1      1     1
 [8,]     7     0    1   0      1     1
 [9,]     7     0    1   1      0     1
[10,]     7     0    1   1      1     0
[11,]     7     1    0   0      1     1
[12,]     7     1    0   1      0     1
[13,]     7     1    0   1      1     0
[14,]     7     1    1   0      0     1
[15,]     7     1    1   0      1     0
[16,]     7     1    1   1      0     0
[17,]     8     0    0   0      1     1
[18,]     8     0    0   1      0     1
[19,]     8     0    0   1      1     0
[20,]     8     0    1   0      0     1
[21,]     8     0    1   0      1     0
[22,]     8     0    1   1      0     0
[23,]     8     1    0   0      0     1
[24,]     8     1    0   0      1     0
[25,]     8     1    0   1      0     0
[26,]     8     1    1   0      0     0
[27,]     9     0    0   0      0     1
[28,]     9     0    0   0      1     0
[29,]     9     0    0   1      0     0
[30,]     9     0    1   0      0     0
[31,]     9     1    0   0      0     0
[32,]    10     0    0   0      0     0
> 
mrip
  • 14,913
  • 4
  • 40
  • 58
  • Thanks a lot, it's exactly what I was looking for. It was a pain to convert it into Perl, though... – Stamm Oct 03 '13 at 21:01