Trying to find longest Uniform Substring

Question

Trying to find longest Uniform Substring.

Suppose I have abbbccda then We have to get the index position of "bbb " ie. [1, 3] So it should return [1, 3] . Because Uniform Substring starts from index 1 and is 3 characters long.

Other Example:

"10000111" => [ 1, 4 ]

"aabbbbbCdAA" => [ 2, 5 ]

What is the python code to solve this

MY code is so long any short Way. Ignore the print there are so many to see the output

x="aaaabbbbCdAA"
LIST1=[]
for char in x:
    if(char not in LIST1 ):
        LIST1.append(char)

print(LIST1)
list1=[]

for i in LIST1:
    list1.append(x.count(i))

print(list1)

Max_length_Charcater= max(list1)
print(Max_length_Charcater)

index_Max_length_Charcater=list1.index(Max_length_Charcater)
print(index_Max_length_Charcater)

y=LIST1[index_Max_length_Charcater]
print(y)

l=index_Max_length_Charcater
start_of_max_length_character=x.find(y)

for i in range(len(x)):
    if(x[i]==y):
        l+=1
print(l)

print("({0},{1})" .format(start_of_max_length_character,l))

There are many different ways to solve this. What is the problem, that occurs with your way of solving? — user8408080, May 05 '20 at 18:19
Hints: 1) use a `set` to determine the uniq characters in each string; 2) use a regex or Python string methods to find the length of runs of those uniq characters; 3) find the index of the longest run. — dawg, May 05 '20 at 18:31
Shouldn't the output to `"aabbbbbCdAA"` be `[2, 6]` instead? — Jake Tae, May 05 '20 at 18:31

Alain T. · Answer 1 · 2020-05-17T02:01:06.103

You can use list comprehensions with zip to match characters with their predecessor and identify the positions where contiguous streaks break. Then, from that list of positions you use zip again to obtain the position ranges (from one break to the next) which you can convert to a list of (start,length) tuples. The tuple with the largest length is the one you want.

string = "aabbbbbCdAA"

breaks = [i for i,(a,b) in enumerate(zip(string,string[1:]),1) if a!=b]
ranges = [ (s,e-s) for s,e in zip([0]+breaks,breaks+[len(string)]) ]
print(max(ranges,key=lambda r:r[1]))

The breaks list will contain [2, 7, 8, 9] which are the start position of letter groups (position zero is implied).

The ranges list will be formed by combining each start of group with the start of the next group (again using zip). This allow the calculation of the size of the repetition for each group: [(0, 2), (2, 5), (7, 1), (8, 1), (9, 2)]

If you feel courageous and want to delve into regular expressions, the re module provides a way to get the substrings of repeated letters directly:

import re

string     = "aabbbbbCdAA"

streaks,_  = zip(*re.findall(r"((.)\2*)",string))
longest    = max(streaks,key=len)

print(string.index(longest),len(longest))
# 2 5

dawg · Answer 2 · 2020-05-05T19:09:56.153

Here is a regex solution:

import re 

strs=("10000111", "aabbbbbCdAA", "abbbccda")

for s in strs:
    uniq=set(s)
    mss=max([max(re.findall(f'{c}+', s), key=len) for c in uniq], key=len)
    print(f'{s}: {s.index(mss)}, {len(mss)}')

Prints:

10000111: 1, 4
aabbbbbCdAA: 2, 5
abbbccda: 1, 3

You can also use groupby to do this:

from itertools import groupby 

for s in strs:
    mss=max([''.join(v) for k,v in groupby(s)], key=len)
    print(f'{s}: {s.index(mss)}, {len(mss)}')
# same output

Trying to find longest Uniform Substring

2 Answers2