expand a string of A and B that are compacted

Question

I have a string of A and B like this: "(BA)4B5A", and I want the output to be BABABABABBBBBA. But the code I have only works if I have the number 1 after A, like "(BA)4B5A1". For letters that don't have a number after it, I just want to repeat it once. I want this to work for any string of A and B

def extensao(seq):

    new_seq = ""
    i = 0;
    while i < len(seq):
        if seq[i] == '(':
            it = i + 1
            exp = ""
            while seq[it]!= ')':
                exp += seq[it]
                it+=1
            it+=1
            num=""
            while it < len(seq) and seq[it].isdigit() == True:
                num += seq[it]
                it+=1
            x = 0
            while x < int(num):
                new_seq += exp
                x+=1
            i = it
        else:
            char = seq[i]
            it=i+1
            if(seq[it].isdigit()==True):
                num=""
                while it < len(seq) and seq[it].isdigit() == True:
                    num += seq[it]
                    it+=1
                x = 0
                while x < int(num):
                    new_seq += char
                    x+=1
                i = it
            else:
                new_seq+=char
                i+=1
    return new_seq



def main():

    seq = input("Escreva uma sequencia:")
    final_seq = extensao(seq)
    print(final_seq)

main()

This is unclear for me. Can A and B be any random string, or are you looking only for single characters? (Please add multiple examples.) — Scorchio, Jan 13 '18 at 22:48
The string will only get letters A and B and numbers that represent the times that they should be repeat. Letters between () are like a motif. (BA)4B5A - BABABABABBBBBA ; B2(AB)3AB2 - BBABABABABB — Bárbara Fonseca, Jan 13 '18 at 23:12

Robᵩ · Accepted Answer · 2018-01-13T23:00:24.180

You might use the re.sub() function, passing a callable as the 2nd argument:

import re

def extensao(seq):
    '"(BA)4B5A", and I want the output to be BABABABABBBBBA'
    return re.sub(r'(([AB])|\(([AB]*?)\))(\d+)',
                  lambda x: (x.group(2) or x.group(3))*int(x.group(4)), seq)

assert extensao("(BA)4B5A") == 'BABABABABBBBBA'

Or, equivalently and perhaps more readably,

import re

def extensao(seq):
    '"(BA)4B5A", and I want the output to be BABABABABBBBBA'
    def replacement(m):
        single_char = m.group(2)
        multi_char = m.group(3)
        count = int(m.group(4))
        char = single_char or multi_char
        return char * count
    pattern = '''
        (?x)    # Verbose
        (       # Grouping to detect single char or (multi char)
            (.) # Match single char and save it in $2
            |
            \((.*?)\) # Match (multi char), save inner bit in $3
        )
        (\d+)   # Save count in $4
    '''
    return re.sub(pattern, replacement, seq)

assert extensao("(BA)4B5A") == 'BABABABABBBBBA'

tim-mccurrach · Answer 2 · 2018-01-13T23:34:54.147

Whilst Rob's answer is probably a much better way to go, the algorithm you have used is basically, correct. Also, if you're nor familiar with regular expressions, they can be a bit overwhelming at first. Having said that, they're definitely worth learning, especially if you're going to do lots of this kind of task.

But since you've clearly gone to a bit of effort to write the algorithm above, I feel it deserves finishing off - tbh it just needed a little bit of tweaking. Below is a 'fixed' version of your code.

If you input something like (BA)4B5A3 it runs fine, but you run into trouble with something like (BA)4B5A. The reason is, when you get to that final A, your original algorithm tried to check the next character to see if it was a digit. But there is no next character so an error was raised, so I added in an aditional if statement to check for that eventuality (As indicated by the comment below).

Also, if you are evaluating if something is True or not you should say if condition is True: or even just if condition: rather than if condition == True:. So that is why i have removed all of the == True

def extensao(seq):
    new_seq = ""
    i = 0
    while i < len(seq):
        if seq[i] == '(':
            it = i + 1
            exp = ""
            while seq[it] != ')':
                exp += seq[it]
                it += 1
            it += 1
            num = ""
            while it < len(seq) and seq[it].isdigit():
                num += seq[it]
                it += 1
            x = 0
            while x < int(num):
                new_seq += exp
                x += 1
            i = it
        else:
            char = seq[i]
            it = i+1
            if it<len(seq):  #To check seq[i] isn't the final character
                if seq[it].isdigit(): #This is the line that was causing the error!!
                    num = ""
                    while it < len(seq) and isdigit(seq[it]):
                        num += seq[it]
                        it += 1
                    x = 0
                    while x < int(num):
                        new_seq += char
                        x += 1
                    i = it
                else:
                    new_seq += char
                    i += 1
            else:  #incase seq[i] was the final character
                new_seq += char
                i += 1
    return new_seq


print(extensao("(BA)4B5A"))

pylang · Answer 3 · 2018-01-15T18:21:33.327

The more_itertools third-party library has a tool for run length encoding/compression problems. See an example from the docs:

    >>> compressed = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
    >>> list(run_length.decode(compressed))
    ['a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd']

This tool is simple to use. Ideally, you would want to passing in an input of a list of tuples each comprising a string and multiplying integer.

Code

Here we will implement a parse helper function to convert your input into the appropriate format.

import itertools as it

import more_itertools as mit


def parse(iterable):
    """Return a list of string, multiplier pairs."""
    iterable = iterable.replace("(", "").replace(")", "")
    pred = lambda x: x.isalpha()
    non_numbers = ("".join(g) for k, g in it.groupby(iterable, pred) if k)
    numbers = (int(list(g)[0]) for k, g in it.groupby(iterable, pred) if not k)
    zipped = list(it.zip_longest(non_numbers, numbers,  fillvalue=1))
    return zipped

Demo

>>> iterable = "(BA)4B5A"

>>> # Application
>>> "".join(mit.run_length.decode(parse(iterable)))
'BABABABABBBBBA'

>>> # Tests
>>> assert parse(iterable) == [("BA", 4), ("B", 5), ("A", 1)]
>>> assert list(mit.run_length.decode(parse(iterable))) == ["BA", "BA", "BA", "BA", "B", "B", "B", "B", "B", "A"]

Details

The parse function removes parentheses from the input iterable. It then builds two generators with itertools.groupby: one for groups of strings and one for groups of multipliers. These groups are zipped together. itertools.zip_longest accepts a fillvalue parameter, so that if the input iterable ends with a string (as in the sample input), the default multiplier is 1.

The run_length.decode method is implemented here:

class run_length(object):
    ...
    def decode(iterable):
        return list(it.chain.from_iterable(it.repeat(k, n) for k, n in iterable))

Note: use > pip install more_itertools in a command line prompt to install this library.

Additional References

a post on how itertools.groupby works
the source code and the GitHub issue for more details

expand a string of A and B that are compacted

3 Answers3