How to find the longest repeating sequence using python

Question

I went through an interview, where they asked me to print the longest repeated character sequence.

I got stuck is there any way to get it?

But my code prints only the count of characters present in a string is there any approach to get the expected output

import pandas as pd
import collections

a   = 'abcxyzaaaabbbbbbb'
lst = collections.Counter(a)
df  = pd.Series(lst)
df

Expected output :

bbbbbbb

How to add logic to in above code?

@Psidom : I was giving it a try but don't know how to implement logic checking is it possible — , Oct 24 '21 at 05:58
`res = max(("".join(g) for _, g in groupby(a)), key=len)` -> `'bbbbbbb'`. Here, `groupby` is [`itertools.groupby`](https://docs.python.org/3/library/itertools.html#itertools.groupby) — Ch3steR, Oct 24 '21 at 06:00
@Ch3steR : can you explain in detail no idea what is happening — , Oct 24 '21 at 06:01
`itertools.groupby` groups repeated consecutive data into one group e.g `[1, 1, 2, 2, 1, 1]` -> `[[1, 1], [2, 2], [1, 1]]`. Now, get the largest group using `max`. — Ch3steR, Oct 24 '21 at 06:04
Few relevant links: [Get the largest string using `max`](https://stackoverflow.com/a/873333/12416453), [How do I use `itertools.groupby`](https://stackoverflow.com/q/773/12416453) — Ch3steR, Oct 24 '21 at 06:09
@Ch3steR I very much doubt an interviewer would expect the interviewee to select `itertools.groupby` as their solution of choice, consider reopening this question, to allow more generic answers. — Rolf of Saxony, Oct 24 '21 at 06:26
@RolfofSaxony Reopened. But since OP is allowed to use pandas why not itertools which is part of python's standard lib. I reopened because it makes no sense to add generic answer in the linked answer as it specifically asked for itertools. You can add or link itertools answer since it would most likely be pythonic solution. — Ch3steR, Oct 24 '21 at 06:33
@Ch3steR I understand but I think the question can be solved without using such modules and for the less advanced user, it's perhaps more educational to grind through the problem, rather than reach for a tool, which simply gives the answer. Thank you for reopening it. — Rolf of Saxony, Oct 24 '21 at 06:55

no comment · Answer 1 · 2021-10-24T07:39:36.023

3

A regex solution:

max(re.split(r'((.)\2*)', a), key=len)

Or without library help (but less efficient):

s = ''
max((s := s * (c in s) + c for c in a), key=len)

Both compute the string 'bbbbbbb'.

edited Oct 24 '21 at 07:39

answered Oct 24 '21 at 06:56

no comment

6,381
4
12
30

@RolfofSaxony Hmm, what's the point of repeating that? That's just the expected output already shown in the question. I might show my output if it deviated from the expected output (like yours does), but if it's just what's expected... – no comment Oct 24 '21 at 07:10
Feel free to reject the edit, I simply added it for clarity – Rolf of Saxony Oct 24 '21 at 07:11
You could improve the efficiency of the second one a little bit by changing `(c in s)` to `s.endswith(c)` – Alain T. Oct 24 '21 at 07:52
@AlainT. Yes, and I had considered `c in s[:1]`, but neither change the worst case complexity, so I opted for brevity/beauty :-) – no comment Oct 24 '21 at 07:55
@AlainT. Hmm, actually... `c in s` is amortized O(1) here. And quite possibly *faster* than loading and calling the `endswith` method. – no comment Oct 24 '21 at 08:02
Could be, although at every change of character it will go through s to not-find the new character. In any case I also subscribe to selecting brevity & elegance over (questionable) efficiency. – Alain T. Oct 24 '21 at 08:12
@AlainT. With "could be" you mean the "faster", not the amortized O(1), right? Also, `in` for strings is almost unbelievably fast. As in it's faster than CPU speed, and the first time I measured it, I had to look up how it's implemented in order to believe it. – no comment Oct 24 '21 at 08:36
I was agreeing with you, and yes I meant faster. I didn't know about in's speed though. good to know. – Alain T. Oct 24 '21 at 08:38
@AlainT. Little demo, [50 billion chars per second](https://tio.run/##LY3bCoMwEETf8xWDL1GRWimUSvFjahpxwWxCkkJL6ben8bIvwzkMs@4TZ8uXm/MpTd4aRDKaIsg46@NBQjAGdOe6vgp@mVH7DdcTayPTXiylAjGCbDAVIWt5kqjx5d8dasW3LBrsE8MeFdpDCOE8cSw5m221Raf7BnKkZSHLUPPDB7j8PWhl@SmrlP4) :-) – no comment Oct 24 '21 at 08:46

Alain T. · Answer 2 · 2021-10-24T08:24:28.073

Without any modules, you could use a comprehension to go backward through possible sizes and get the first character multiplication that is present in the string:

next(c*s for s in range(len(a),0,-1) for c in a if c*s in a)

That's quite bad in terms of efficiency though

another approach would be to detect the positions of letter changes and take the longest subrange from those

chg = [i for i,(x,y) in enumerate(zip(a,a[1:]),1) if x!=y]
s,e = max(zip([0]+chg,chg+[len(a)]),key=lambda se:se[1]-se[0])
longest = a[s:e]

Of course a basic for-loop solution will also work:

si,sc = 0,"" # current streak (start, character)
ls,le = 0,0  # longest streak (start, end)
for i,c in enumerate(a+" "):      # extra space to force out last char.
    if i-si > le-ls: ls,le = si,i # new longest
    if sc != c:      si,sc = i,c  # new streak
longest = a[ls:le]

print(longest) # bbbbbbb

score 1 · Answer 3 · answered Oct 24 '21 at 06:49

A more long winded solution, picked wholesale from:
maximum-consecutive-repeating-character-string

def maxRepeating(str):
 
    len_s = len(str)
    count = 0
 
    # Find the maximum repeating
    # character starting from str[i]
    res = str[0]
    for i in range(len_s):
         
        cur_count = 1
        for j in range(i + 1, len_s):
            if (str[i] != str[j]):
                break
            cur_count += 1
 
        # Update result if required
        if cur_count > count :
            count = cur_count
            res = str[i]
    return res, count
 
# Driver code
if __name__ == "__main__":
    str = "abcxyzaaaabbbbbbb"
    print(maxRepeating(str))

Solution:

('b', 7)

@don'ttalkjustcode Ha! Why yes, as a matter of fact I did. Simply because of the environment which caused this question to be asked, namely an interview, without access to SO. I had a sneaking suspicion that this question might attract more than a few solutions. :) — Rolf of Saxony, Oct 24 '21 at 07:04

How to find the longest repeating sequence using python

3 Answers3