1

I have a string:

a = babababbaaaaababbbab

And it needs to be shortened so it looks like this:

(ba)3(b)2(a)5ba(b)3ab

So basically it needs to take all repeating characters and write how many times they are repeating instead of printing them. I managed to do half of this:

from itertools import groupby
a = 'babababbaaaaababbbab'
grouped = ["".join(grp) for patt,grp in groupby(a)]
solved = [str(len(i)) + i[0] for i in grouped if len(i) >= 2]

but this only does this for characters that are repeating but not patterns. I get it that I could do this by finding 'ab' pattern in string but this needs to be viable for every possible string. Has anyone encountered something similar?

WholesomeGhost
  • 1,101
  • 2
  • 17
  • 31

3 Answers3

10

You can easily do this with regex:

>>> repl= lambda match:'({}){}'.format(match.group(1), len(match.group())//len(match.group(1)))
>>> re.sub(r'(.+?)\1+', repl, 'babababbaaaaababbbab')
'(ba)3(b)2(a)5ba(b)3ab'

Not much to explain here. The pattern (.+?)\1+ matches repeating character sequences, and the lambda function rewrites them to the form (sequence)number.

Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
  • Awesome ! I think i might spare a good time to master regex. Really good answer. – Chiheb Nexus Jun 09 '17 at 09:17
  • For `aabaabaab`, this gives a rather unintuitive `(a)2(baa)2b` instead of `(aab)3`. Although that's not to say it's wrong - the problem is a bit under-specified. – Bernhard Barker Jun 09 '17 at 10:06
  • @Dukeling That's to conform with OP's statement that `aaabbbaaabbb` is to become `(a)3(b)3(a)3(b)3` - it repeats the shortest possible sequence. If that's undesirable, you can try changing `(.+?)\1+` to `(.+)\1+`, though that also has some weird quirks - for example it'll turn `abababab` into `(abab)2`. – Aran-Fey Jun 09 '17 at 10:20
0

This is what I came up with, the code is a mess, but I just wanted to have a quick fun, so I let it be like this

a = 'babababbaaaaababbbab'

def compress(text):
    for i in range(1, len(text) // 2):
        for j, c in enumerate(text[:-i if i > 0 else len(text)]):
            pattern = text[j:i+j]
            new_text = pattern_repeats_processor(pattern, text, j)
            if new_text != text:
                return compress(new_text)
    return text

def pattern_repeats_processor(pattern, text, i):
    chunk = pattern
    count = 1 
    while chunk == pattern and i + (count + 1) * len(pattern) < len(text):
        chunk = text[i + count * len(pattern): i + (count + 1) * len(pattern)] 
        if chunk == pattern:
            count = count + 1
        else:
            break
    if count > 1:
        return text[:i] + '(' + pattern + ')' + str(count) + text[i + (count + 0) * len(pattern):]
    return text

print(compress(a))
print(a)

It makes babababbaaaaababbbab => (ba)3(b)2(a)5ba(b)3ab

P.S. Of course answer of Rowing is miles better, pretty impressive even

-2

I'm not sure what exactly you're looking for but here hope this helps.

A=a.count('a')
B=a.count('b')
AB=a.count('ab')
BAB=a.count('bab')
BA=a.count('ba')
print(A,'(a)',B,'(b)',AB,'(ab)',BAB,'(bab)',BA,'(ba)')