How to find the longest repeated adjacent values in a string and put parentheses around them

Question

I am trying to iterate through a string and mark () around the longest repeated adjacent values. Example:

"344556(7777)5412"

max_run = "0"    
J = "34455677775412"
for x in range(len(J)-1):
    if J[x] == J[x+1]
      if J[x:x+2] > max_run:
          print( "(", end = "")
          max_run = J[x:x+2]
          print( ")", end = "")

cards · Answer 1 · 2022-07-04T14:42:12.370

6

The method groupby from package itertools of the standard library sequentially group terms, then take the maximum.

import itertools as it

ref_string = "34455677775412"

max_squence = ''.join(max((list(j) for _, j in it.groupby(ref_string)), key=len))


print(ref_string.replace(max_squence, f'({max_squence})'))

Another implementation of the body of the program (credits to Kelly Bundy): first join each group to a string and then filter by longest string

max_squence = max((''.join(j) for _, j in it.groupby(ref_string)), key=len)

edited Jul 04 '22 at 14:42

answered Nov 05 '21 at 18:12

cards

3,936
1
7
25

Btw could be a little shorter/simpler by joining each group. – Kelly Bundy Jul 04 '22 at 13:22
Ahahaha yes, as well! For sure more compact and readable... but there will be a `join` call for each group. If you don't mind I will add it, is it ok for you? – cards Jul 04 '22 at 13:28
Sure it's ok. And yes, it's probably a bit slower, but if you want fast, you wouldn't call `list`, either :-) – Kelly Bundy Jul 04 '22 at 13:55
1

Hmm, actually it appears to be a bit *faster*: [benchmark](https://tio.run/##rZJha8IwEIa/3684@qWJBKnTujHoLxGR2p0uXZuE5Dr013dpVLaBXxw7CCS5e5/3wsWd@d2a5Yvz43jwtkfWPWlG3TvrGT05qhkgEA8OK8zzHK4pzeTZ2i5gHeIBwNNhF9hrc4yF2XK1Ksv1c4xytXjKoLXaJMB82sEEgsa@UYiXG8AY2TUn@vokRKcDi1biwXrcKWwxyjXPj94Obn8W32ZSKvygc9WRkTJTd1CbWbv9D057gyTgQ5ykv8EefdUvyF8JWwA6USPSJCVM8mbSphm8JgMXZSzos@5EIyVAKtpNRb42RxJLeam7p52C4yj72N3l04hGYTJTaIZ@T75aFEWhrl@qKotocVNenKfFOMMFrWPvuQmYK2zkj97kOH4B) with three more variations. – Kelly Bundy Jul 04 '22 at 14:09
Thanks for sharing, nice also the `*` tricks! – cards Jul 04 '22 at 14:15
Hmm, why did you switch to tuple? – Kelly Bundy Jul 04 '22 at 14:25
I thought the `tuple` were less "time" consuming wrt `list`, kind of lighter than list due to less methods, & co but I was wrong! I perform a test also with a string of the size of 1k but `tuple` is always worst. By the way for long string lists-like methods are more performant than `join` – cards Jul 04 '22 at 14:42
1

Yeah, tuple is slightly "lighter" and faster to iterate/access, but in my experience slower to build. – Kelly Bundy Jul 04 '22 at 14:46

score 1 · Answer 2 · answered Nov 05 '21 at 18:19

Love itertools, but as there is already a (nice) solution withgroupby, here is one with a classical loop:

J = "34455677775412"

run = []
prev = None
for pos, char in enumerate(J):
    if char == prev:
        run[-1][0] += 1
    else:
        run.append([1, pos])
    prev = char
print(run)
a,b = max(run, key=lambda x: x[0])

J[:b]+'('+J[b:b+a]+')'+J[b+a:]

output: '344556(7777)5412'

score 0 · Answer 3 · answered Nov 05 '21 at 18:18

In case you can't use any standard library methods like groupby, here's a plain python implementation that does the same thing:

i = 0
max_start, max_end = 0, 0
J = "34455677775412"
# find the longest repeating sequence
while i < len(J):
    j = i
    while j < len(J) and J[j] == J[i]:
        j += 1
    max_start, max_end = max([(max_start, max_end), (i, j)], key=lambda e: e[1] - e[0])
    i = j
print(max_start, max_end, J[max_start:max_end])

J = J[:max_start] + "(" + J[max_start:max_end] + ")" + J[max_end:]  # insert the parentheses
print(J)

score 0 · Answer 4 · answered Nov 05 '21 at 22:14

You could also use Python regex library re to achieve a similar solution to @cards's one

import re

J = "34455677775412"
pattern = r'(.)\1+'

longest_sequence = max([match.group() for match in re.finditer(pattern, J)])
print(J.replace(longest_sequence, f'({longest_sequence})'))

How to find the longest repeated adjacent values in a string and put parentheses around them

4 Answers4