There are many questions which ask to find the longest repeating substring:
- Longest substring that occurs at least twice: C++ question
- Find longest repeating substring in JavaScript using regular expressions
- Find longest repeating strings?
- Regex to match the longest repeating substring
But these don't exactly match my requirements, which are:
- The substrings may not overlap (they are disjoint).
- The substrings are allowed to be non-adjacent.
- Any characters are allowed.
- I would like to match the longest pattern like this.
So far I have this:
>>> m = re.match(".*(?P<grp>.+).*(?P=grp).*", "dhiblhip")
>>> m.group('grp')
'i'
I think this is matching the last substring which repeats itself, 'i'
, but that's certainly not the longest one. I'd expect the following output for the following input:
'123abc'
->''
'hh'
->'h'
'hihi'
->'hi'
'dhiblhip'
->'hi'
'phiblhip'
->'hi'
(note how I do not return'p'
since it is not as long as'hi'
even though it is a repeating disjoint substring.)'racecaracecar'
->'raceca'
(note how I can't recycle the middler
.) In this case,'acecar'
is just as acceptable.
I am using Python's re
and would like to continue to do so, but answers in another language are not unwelcome.