We can easily produce valid strings using itertools.product
to produce tuples of 'a' and 'b' of the required length, join the tuples into strings, and then filter out strings that don't contain 'bbb'.
We don't need to store all the strings in a list. Instead, we produce them using a generator function.
To count the number of valid strings of a given length we still don't need to make a list. We can easily count the strings yielded by the generator using the built-in sum function and a generator expression.
Here's a short demo.
from itertools import product
def make_valid(n):
for s in map(''.join, product('ab', repeat=n)):
if 'bbb' in s:
yield s
# Print all the valid strings for n = 5
for i, t in enumerate(make_valid(5), 1):
print(i, t)
print()
# Count the number of valid strings for some small values of n
for n in range(16):
print(n, sum(1 for _ in make_valid(n)))
print()
output
1 aabbb
2 abbba
3 abbbb
4 babbb
5 bbbaa
6 bbbab
7 bbbba
8 bbbbb
0 0
1 0
2 0
3 1
4 3
5 8
6 20
7 47
8 107
9 238
10 520
11 1121
12 2391
13 5056
14 10616
15 22159
This strategy is ok for small n
, but we need a formula to calculate those counts for larger n
. Fortunately, this sequence can be found in the OEIS as sequence A050231, which lists a couple of useful formulae. It even has some Python code, although we can make a more efficient function using a different formula. Using this formula we can easily calculate the counts for large n
, although for n > 1000
we may need to increase the recursion limit (depending on what's in the cache
dictionary).
import sys
def valid_num(n, cache={0:0, 1:0, 2:0, 3:1, 4:3}):
if n in cache:
return cache[n]
v = cache[n] = 2 * valid_num(n-1) - valid_num(n-4) + (1 << (n-4))
return v
# Calculate the same counts using the formula
for n in range(16):
print(n, valid_num(n))
print()
# Calculate some larger counts using the formula
for n in (20, 100, 1000):
print(n, valid_num(n))
print()
# Calculate the count for n = 10000
sys.setrecursionlimit(10000)
n = 10000
v = valid_num(n)
print(n, 'length:', len(str(v)))
print(v)
output
0 0
1 0
2 0
3 1
4 3
5 8
6 20
7 47
8 107
9 238
10 520
11 1121
12 2391
13 5056
14 10616
15 22159
20 825259
100 1267318799554307616411887824515
1000 10715086071862673209484250490600018100539745081012589876867705301975463230239516162056449817835254127137269714485423444296635751843450469485667983663407684446390757637525684523851088159545633725265629658895944476385164970179813685921539685461239078098993547717387680879133627403305119486713242032255067
10000 length: 3011
19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775626089623134328073983786389675177701186558011272310697526877798064028762732641567763583009365923237958579619694722743012623795847397854998863572912757259022370929851354671535479510177534365020486167704487189515540498377935792784729998056204236193219552902248837389484924415956257294989763770669720233733644286583000382759075539053082652941408713883472715627420221687140691149326172458181449913074158357403912912879276902845540785568622960033810574475726502542130288984975487755939111059374407249361193935123107325839835567845152301152599696481119763066457621100802588552680671318557266858044791840868082518099393177406617354957371241176940107570738702088618534197370980151722230231186806124846732235800482635363983329963660230357785349549545506065669831114315218532610109374476503772297612747522883328342787820540011850973090403518483383437648807620569472507786915081183978792023340064365641436883629372337926937368346416759281652638487818090591411436604449009422265319311879409918048008416090298359064134661715823412371674763461157363128721685818760472532914713274785795923563234731723132316960019917396064130633819807185968362914576441538717071994274286406827253010143879860255399415078316174263450870289088692073427388192608054285426524914418151339875321474768984105710841294710811517295773844059882211433112941200409216807692338045919038698016225727309613247405118664483317505367054663735478905020533879998277077423479999938238338161234544356443164377399933151023535182334388940451107785030288175719971170053676007075190227062172140481425461917515455479972068054339784181607496627198030056906424447389442115111598117425687643181400799735513708227684305240112451551816573610657557740466165389046532852242049143998343971456857060951994667027419070327679375387245482516968508426783900751557991025654896592270261372186844112570682419026071390817038382780816857411320558427785718592835834380922916168890933210737876196968898314180514932819277476011379800392972374348601006573628313908492614955299976109070068900439705816010323545500438056558640731711137133200529511554379646269211769945584022230495812252890259551503449397117011713619252886812420071394209078307064109175851940790347483097635133334458431432349757774924271783333143566842831599567399569263816537290034939347896683277449442140167815797283546947849352457384014905698858773315056621125677551281637613936596108267979291171314129517832750587062076381907831127494015516619913002000219217551370967381359461580273960378080346580481200727681280258846559027048156913034694942538168049000091115128411182728198666360471953351549972522903396839735125178400179203500439758780473093919765016217623879
Adirio mentions in the comments that we don't need to initialise the cache with 4: 3
, and that we can make this version a little more efficient by using a list instead of a dict for the cache. This works because the recursion guarantees that any new item added to the cache will always have an index 1 greater than that of the current last item in the cache.
Here's the improved version:
def valid_num(n, cache=[0, 0, 0, 1]):
if n >= len(cache):
cache.append(2 * valid_num(n-1) - valid_num(n-4) + (1 << (n-4)))
return cache[n]
Thanks, Adirio!