-1

A string is valid if and only if it has only two characters a and b in it (not necessary that a should be present) and it must have three b's in it in succession in order to be valid. For example for string of length n=4 [bbbb,bbba,abbb] are valid strings that are totally valid strings are 3 and for n=3 there is only one valid string [bbb]. n can be any integer up to 10**4. My approach to this problem can be naive as I am new to programming but this is how I started:

import itertools
x=list(input().strip())
test=True
lis=[]
for _ in (x):
    if _ == 'a' or _=='b':
        continue
    else:
        test=False
        break

if test:
    y=list(itertools.permutations(x))
    print(y)

Now I am looking for a pattern for n=5,6,7 and then implement it for the general but I want to eliminate all the invalid strings from list y that doesn't satisfy above conditions.I need to print count of all valid strings for a a particular integer n which corresponds to length of the string .

Demonking28
  • 749
  • 7
  • 21
  • You should probably give your looping variable a better name than `_`. – khelwood Oct 19 '17 at 13:18
  • 6
    You need to clarify the problem requirements. Are you supposed to _recognize_ these strings, or print them all, or what? – Davis Herring Oct 19 '17 at 13:18
  • 4
    You could check each string against the [regular expression](https://en.wikipedia.org/wiki/Regular_expression) `[ab]*bbb[ab]*`. – alex kucksdorf Oct 19 '17 at 13:21
  • 1
    @DavisHerring I need to print all valid strings for a particular length of integer n.Here i am trying to find a pattern – Demonking28 Oct 19 '17 at 13:21
  • So you need to put that info into the question itself, don't bury it down here in the comments. – PM 2Ring Oct 19 '17 at 13:22
  • Also not sure what your input `x` is. Is it the number of characters in the string, number of strings or the string itself? – anupsabraham Oct 19 '17 at 13:25
  • @anupsabraham it is a random string of length of n,here i am just trying to find out the pattern .for example if n=6 i can take a list which has bbbaaa and then work on it – Demonking28 Oct 19 '17 at 13:27
  • 1
    "`n` can be any integer up to `10**4`". It will take a huge amount of time to generate all valid permutations for large `n`, since the total number is approximately `2**n` – PM 2Ring Oct 19 '17 at 13:32
  • @PM2Ring any other approach for the problem? – Demonking28 Oct 19 '17 at 13:36
  • My approach could be probably improved memory wise by using generators but for n in a decent range should work without problems. Also the recursion limit should be taken into account. – Adirio Oct 19 '17 at 13:41
  • @Demonking28, please avoid writing yourself implementations that already exist in the language. See my answer to understand how this recursion can be avoided using `itertools` – Daniel Trugman Oct 19 '17 at 13:46
  • 1
    Do you need to print all the valid strings? Or do you merely need to print _the number_ of valid strings? – PM 2Ring Oct 19 '17 at 13:59
  • @PM2Ring no of valid strings – Demonking28 Oct 19 '17 at 14:20
  • Ok. The total number of valid strings for `n = 10**4` is a number with 3011 decimal digits. For `n = 100`, it's "only" 1267318799554307616411887824515. ;) – PM 2Ring Oct 19 '17 at 14:40

4 Answers4

1

A simple approach for this would be generating all the possible combinatios of strings of a certain length with only chars 'a' and 'b' and then testing if there is the 'bbb' pattern inside them.

def recursive_function(n, seed=None, chars="ab"):
    if seed is None:
        seed = ['']
    result = []
    for char in chars:
        result.extend(list(map(lambda x: x + char, seed)))
    if n-1:
        return recursive_function(n-1, result, chars)
    else:
        return result

unfiltered = recursive_function(5)

filtered = list(filter(lambda x: 'bbb' in x, unfiltered))

print(filtered)

The function appends to every str inside the parameter seed the options in chars. So if seed = ['var'] and chars = "12" => result = ['var1', 'var2']. Before doing that we are creating a list with an empty str if we do not provide it. After it we are checking if we have reached the desired length or calling to ourselves but reducing n by 1.

This generates a list of length equal to the length of the chars variable to the power of n: len(chars)**n. This means that you will get soon out of memory n=100 is giving me memory issues, depends on your mahcine, setup, ...

Then we filter it and convert it back to a list using a lambda function that checks if it has the substring 'bbb'.

Another solution would be using itertools:

from itertools import product

unfiltered = map(''.join, product('ab', repeat=5))
#unfiltered = (''.join(x) for x in product('ab', repeat=5)) # suggested by @DanielTrugman

filtered = filter(lambda x: 'bbb' in x, unfiltered)
#filtered = (x for x in unfiltered if 'bbb' in x) # suggested by @DanielTrugman

i = sum(1 for _ in filtered) # suggested by @PM2Ring

#for i, _ in enumerate(filtered):
#    pass
#i += 1

#i = 0
#for _ in filtered:
#    i += 1

print(i)

I gave some alternatives to each step as sugested by @DanielTrugman and @PM2Ring, any combination of them should work. The key point in this new approach is that they are never creating a list, which was the one giving memory problems before, they are creating generators. A generator can be used to generate the next element and can be used in for loops but doesn't store the full content in memory (this also means that once I iterate over the generator, I would need to create it again if I want to use it in another for loop). By using enumerate each element is transormed into a tuple of the form (i, element) being i an auto-incremented integer starting at 0 so when the loop ends i equals the number of elements minus 1. An alternative to this last part would be to do it manually.

Remember that you can't use both as the second time the generator would be empty, either you recreate it re executing both the unfiltered and filtered part or you just use one of the methods.

NOTE: THIS WILL TAKE AGES FOR N=10^5

A further answer by @PM2Ring gives information on how you could calculate the result with a numeric formula that will be mroe effficient that iterating over the possibilities, even if they are using generators.

Adirio
  • 5,040
  • 1
  • 14
  • 26
  • 1
    You should use standard methods when possible! `itertools.combinations_with_replacement` does exactly that! – Daniel Trugman Oct 19 '17 at 13:43
  • @Adirio Thanks ,this is what i was looking for :) – Demonking28 Oct 19 '17 at 13:47
  • @DanielTrugman those are ordered and in `tuple` form – Adirio Oct 19 '17 at 13:52
  • @Adirio, and what's wrong with that? – Daniel Trugman Oct 19 '17 at 13:53
  • @Adirio but your solution fails for n=10**5 – Demonking28 Oct 19 '17 at 13:56
  • @DanielTrugman `itertools.product` would be closer to what we want, as those are unordered. We would only need to transform from the tuple to the str. – Adirio Oct 19 '17 at 13:56
  • @Adirio, you are right. Fixed my implementation. – Daniel Trugman Oct 19 '17 at 14:04
  • @Demonking28 it raises a MemoryError because the `list` we are trying to create is too big. – Adirio Oct 19 '17 at 14:16
  • @Adirio is there any possible way to print the no of valid strings without printing the valid strings so that the script runs well for n=10**5? – Demonking28 Oct 19 '17 at 14:33
  • Sure, there is, by not computing the list. Ill edit my answer. – Adirio Oct 19 '17 at 14:35
  • @Adirio thanks,you are awesome :) Since i am new to programming people like you is what motivates a rookie like me to dive deep into it! – Demonking28 Oct 19 '17 at 14:37
  • 1
    FWIW, an elegant way to count the number of items that a generator yields is to pass a generator expression to `sum`, eg `sum(1 for _ in filtered)`. – PM 2Ring Oct 19 '17 at 15:09
  • @PM2Ring I really like the `sum` way. Its compact and uses generators plus `1`, which in most Python implementations will be very memory efficient as they keep low numbers into memory permanent for faster access and less memory usage. – Adirio Oct 20 '17 at 06:01
  • Yes, the CPython interpreter pre-stores the integers from -5 to 256 in a cache. However, it's also pretty good at caching constants that can be determined at compile time, so (for example) if a function uses the literal 1000 it doesn't need to re-build that object each time the function is called. – PM 2Ring Oct 20 '17 at 06:09
1

You could use this to generate all possible combinations for a given length n and then filter the invalid ones.

A faster method that uses comprehension (prefer for smaller n values):

n = 10
combinations = [''.join(x) for x in itertools.product('ab', repeat=n)]
valid = [x for x in combinations if 'bbb' in x]
print len(valid)

A RAM efficient method that uses generators (prefer for bigger n values):

n = 10
combinations = (''.join(x) for x in itertools.product(['a', 'b'], repeat=n))
valid = (x for x in combinations if 'bbb' in x)
print sum(1 for _ in valid)
Daniel Trugman
  • 8,186
  • 20
  • 41
1

We can easily produce valid strings using itertools.product to produce tuples of 'a' and 'b' of the required length, join the tuples into strings, and then filter out strings that don't contain 'bbb'.

We don't need to store all the strings in a list. Instead, we produce them using a generator function.

To count the number of valid strings of a given length we still don't need to make a list. We can easily count the strings yielded by the generator using the built-in sum function and a generator expression.

Here's a short demo.

from itertools import  product

def make_valid(n):
    for s in map(''.join, product('ab', repeat=n)):
        if 'bbb' in s:
            yield s

# Print all the valid strings for n = 5
for i, t in enumerate(make_valid(5), 1):
    print(i, t)
print()

# Count the number of valid strings for some small values of n
for n in range(16):
    print(n, sum(1 for _ in make_valid(n)))
print()

output

1 aabbb
2 abbba
3 abbbb
4 babbb
5 bbbaa
6 bbbab
7 bbbba
8 bbbbb

0 0
1 0
2 0
3 1
4 3
5 8
6 20
7 47
8 107
9 238
10 520
11 1121
12 2391
13 5056
14 10616
15 22159

This strategy is ok for small n, but we need a formula to calculate those counts for larger n. Fortunately, this sequence can be found in the OEIS as sequence A050231, which lists a couple of useful formulae. It even has some Python code, although we can make a more efficient function using a different formula. Using this formula we can easily calculate the counts for large n, although for n > 1000 we may need to increase the recursion limit (depending on what's in the cache dictionary).

import sys

def valid_num(n, cache={0:0, 1:0, 2:0, 3:1, 4:3}):
    if n in cache:
        return cache[n]
    v = cache[n] = 2 * valid_num(n-1) - valid_num(n-4) + (1 << (n-4))
    return v

# Calculate the same counts using the formula
for n in range(16):
    print(n, valid_num(n))
print()

# Calculate some larger counts using the formula
for n in (20, 100, 1000):
    print(n, valid_num(n))
print()

# Calculate the count for n = 10000
sys.setrecursionlimit(10000)
n = 10000
v = valid_num(n)
print(n, 'length:', len(str(v)))
print(v)

output

0 0
1 0
2 0
3 1
4 3
5 8
6 20
7 47
8 107
9 238
10 520
11 1121
12 2391
13 5056
14 10616
15 22159

20 825259
100 1267318799554307616411887824515
1000 10715086071862673209484250490600018100539745081012589876867705301975463230239516162056449817835254127137269714485423444296635751843450469485667983663407684446390757637525684523851088159545633725265629658895944476385164970179813685921539685461239078098993547717387680879133627403305119486713242032255067

10000 length: 3011
19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775626089623134328073983786389675177701186558011272310697526877798064028762732641567763583009365923237958579619694722743012623795847397854998863572912757259022370929851354671535479510177534365020486167704487189515540498377935792784729998056204236193219552902248837389484924415956257294989763770669720233733644286583000382759075539053082652941408713883472715627420221687140691149326172458181449913074158357403912912879276902845540785568622960033810574475726502542130288984975487755939111059374407249361193935123107325839835567845152301152599696481119763066457621100802588552680671318557266858044791840868082518099393177406617354957371241176940107570738702088618534197370980151722230231186806124846732235800482635363983329963660230357785349549545506065669831114315218532610109374476503772297612747522883328342787820540011850973090403518483383437648807620569472507786915081183978792023340064365641436883629372337926937368346416759281652638487818090591411436604449009422265319311879409918048008416090298359064134661715823412371674763461157363128721685818760472532914713274785795923563234731723132316960019917396064130633819807185968362914576441538717071994274286406827253010143879860255399415078316174263450870289088692073427388192608054285426524914418151339875321474768984105710841294710811517295773844059882211433112941200409216807692338045919038698016225727309613247405118664483317505367054663735478905020533879998277077423479999938238338161234544356443164377399933151023535182334388940451107785030288175719971170053676007075190227062172140481425461917515455479972068054339784181607496627198030056906424447389442115111598117425687643181400799735513708227684305240112451551816573610657557740466165389046532852242049143998343971456857060951994667027419070327679375387245482516968508426783900751557991025654896592270261372186844112570682419026071390817038382780816857411320558427785718592835834380922916168890933210737876196968898314180514932819277476011379800392972374348601006573628313908492614955299976109070068900439705816010323545500438056558640731711137133200529511554379646269211769945584022230495812252890259551503449397117011713619252886812420071394209078307064109175851940790347483097635133334458431432349757774924271783333143566842831599567399569263816537290034939347896683277449442140167815797283546947849352457384014905698858773315056621125677551281637613936596108267979291171314129517832750587062076381907831127494015516619913002000219217551370967381359461580273960378080346580481200727681280258846559027048156913034694942538168049000091115128411182728198666360471953351549972522903396839735125178400179203500439758780473093919765016217623879

Adirio mentions in the comments that we don't need to initialise the cache with 4: 3, and that we can make this version a little more efficient by using a list instead of a dict for the cache. This works because the recursion guarantees that any new item added to the cache will always have an index 1 greater than that of the current last item in the cache.

Here's the improved version:

def valid_num(n, cache=[0, 0, 0, 1]):
    if n >= len(cache):
        cache.append(2 * valid_num(n-1) - valid_num(n-4) + (1 << (n-4)))
    return cache[n]

Thanks, Adirio!

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • As my answer is the accepted one but I find your information about the formulas quite important, I would like either to link to your answer or include this info in the accepted answer. Which do you prefer? I would obviously give you credit for it if I directly include the info in my answer. – Adirio Oct 20 '17 at 06:48
  • @Adirio Thanks for asking. I prefer if you just link to my answer, there's not much point repeating info that's already on the page. – PM 2Ring Oct 20 '17 at 06:54
  • As this was changed to the accepted answer, I would like to suggest a couple of things: 1) The 4th value of the cache doesn't need to be given at initialization:` valid_num(4) = 2 * valid_num(3) - valid_num(0) + (1 << 0) = 2 * 1 - 0 + 1 = 3`. 2) The cache could be stored in a `list` as it should be more memory efficient. I will edit your answer to show how I would do it, accept it or modify it at will. – Adirio Oct 30 '17 at 08:06
  • @Adirio Thanks for the suggestions. It didn't occur to me that a list would work here. Generally, edits that make changes to code (apart from simple formatting changes) will be rejected as conflicting with the author's intent, although if I'd been online at the time I probably would have accepted and improved your edit. – PM 2Ring Oct 30 '17 at 14:28
0

This can be done by using simple regex matching. First check whether the string contains only a's and b's. Then, create all the combinations using itertools and filter the ones you need using regex.

def return_valid_strings(string):
    if re.match("^[ab]*$", string):
        y = ["".join(x) for x in itertools.permutations(string) if re.match("^[ab]*bbb[ab]*$", "".join(x))]
    return list(set(y))
anupsabraham
  • 2,781
  • 2
  • 24
  • 35