Iterating over all possible tosses is definitely not going to work for large n (and here n is already large from 20 I guess). One can only calculate this efficiently using some combinatorics.
Let's first start with a simple example: minimum two groups for four tosses: if we want to calculate the number of tosses that result in two our more groups. There are a few possibilities (we number the groups like 1
, 2
, etc.):
Two groups:
1112
1122
1222
Three groups:
1123
1223
1233
Four groups:
1234
We know that the value of group G1
will be different then the one for G2
and the one for G2
different from G3
and so on. So it holds that:
G1 != G2 != G3 != ... != Gn
Since there are only two values (heads and tails) this means that if we determine the first group, we have deterimed the value for all groups. If G1
is heads, all even groups are tails and all odd groups are heads.
So this means that for every combination above, there are exactly two configurations. This means that for the case n=4,m=1
, there are 2×7=14 configurations (which is exactly what we get with get when we work it out with the program in your question).
Now the only problem we still have to face is how we are going to count these super-configurations. We can do this by introducing what I would call upgrade notation:
You notate an increase of the group with 1
and the same group with 0
.
So now 1223
becomes 0101
: we upgrade the group on the second and the fourth index. And 1234
is 0111
. How does this help? Well in fact for k groups, we only have to count the number of combinations with k-1 upgrades. So this means the number of combinations (k-1 nCr n-1). With n the length of the string (or the total number of tosses), and k the number of groups*. Now (k-1 nCr n) is defined as: (n-1)!/((k-1)!*(n-k)!). With !
the factorial. I borrowed a way to calculate nCr
from here:
import operator as op
from functools import reduce
def ncr(n, r):
r = min(r, n-r)
if r == 0: return 1
return reduce(op.mul,range(n-r+1,n+1))//reduce(op.mul,range(1,r+1))
And now the last step is only to calculate the number of combinations (times two) for every k
higher than m
up to n
. So:
def group_num(n,m):
total = 0
for i in range(m+1,n+1):
total += ncr(n-1,i-1)
return 2*total
or putting it into a one-liner:
def group_num(n,m):
return 2*sum(ncr(n-1,i-1) for i in range(m+1,n+1))
Now we can test our code:
>>> group_num(4,1)
14
>>> group_num(10,6)
260
>>> group_num(200,110)
125409844583698900384745448848066934911164089598228258053888
>>> group_num(500,260)
606609270097725645141493459934317664675833205307408583743573981944035656294544972476175251556943050783856517254296239386495518290119646417132819099088
Which are the expected numbers (for the first two). As you can see the total amount blows up enormously so even the fastest algorithm to count the number of tosses one at a time will easily be outperformed by this approach (the results were calculated in less than a second).
The ncr
function can be calculated in O(n) (but this can even be improved by using a memoize on the factorial for instance); and the group_num
function is thus computed (without memoize optimization) in O((n-m)×n). Which thus completely outfactors the exponential behavior.