1

I'm working on a problem to find wholly repeated shortest substring of a given string, and if no match, return length of the string.

My major idea is learned from Juliana's answer here (Check if string is repetition of an unknown substring), I rewrite the algorithm in Python 2.7.

I think it should be O(n^2), but not confident I am correct, here is my thought -- since in the outer loop, it tries possibility of begin character to iterate with -- it is O(n) external loop, and in the inner loop, it compares character one by one -- it is O(n) internal comparison. So, overall time complexity is O(n^2)? If I am not correct, please help to correct. If I am correct, please help to confirm. :)

Input and output example

catcatcat => 3
aaaaaa=>1
aaaaaba = > 7

My code,

def rorate_solution(word):
    for i in range(1, len(word)//2 + 1):
        j = i
        k = 0
        while k < len(word):
            if word[j] != word[k]:
                break
            j+=1
            if j == len(word):
                j = 0
            k+=1
        else:
            return i

    return len(word)

if __name__ == "__main__":
    print rorate_solution('catcatcat')
    print rorate_solution('catcatcatdog')
    print rorate_solution('abaaba')
    print rorate_solution('aaaaab')
    print rorate_solution('aaaaaa')
Community
  • 1
  • 1
Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • 1
    (`rorate` isn't ringing too many bells - consider _shift_.) With regard to time complexity analysis, please state whether you consider complexity of the algorithm/_procedure_ presented, or the problem (fastest algorithm). – greybeard Oct 27 '16 at 06:04
  • 1
    (The title of this post is appalling. I'm sorry not to have found an "official" name for (the participants in) this problem. _Periodic string_ does seem to be an established term for strings that are repetitions of one smaller string (I call it _base_ for this comment), but even the meaning of _period_ in that context seems to differ between the number of repetitions (not necessarily greatest-) and the repeated string (not necessarily shortest-). I'm not even sure all agree on the string to necessarily begin and end with _base_. Another diction dubs string a _power_ of _base_. – greybeard Oct 27 '16 at 06:18
  • 1
    Isn't it a duplicate with http://stackoverflow.com/questions/6021274/finding-shortest-repeating-cycle-in-word ? – Damien Prot Oct 27 '16 at 09:25
  • 1
    @DamienProt: _If_ the question was _Fastest algorithm to find, for a given string `s`, the shortest word `w` such that `s` is a repetition of `w`?_, [Buge](http://stackoverflow.com/users/830749/buge) gave [the perfect answer: first half of KMP](http://stackoverflow.com/a/33864413/3789665). (I dislike the KMP-presentation in the English wikipedia.) – greybeard Oct 27 '16 at 10:23
  • 1
    Similar (at least to Damien Prot's preceding find): [How to detect if a repeating pattern exists](http://stackoverflow.com/a/26550530/3789665). (See "the possible duplicate link" of that question for a _different_ approach to periodicity.) – greybeard Oct 27 '16 at 11:29

1 Answers1

1

Your assessment of the runtime of your re-write is correct.


But Use just the preprocessing of KMP to find the shortest period of a string.


(The re-write could be more simple:

def shortestPeriod(word):
    """the length of the shortest prefix p of word
    such that word is a repetition p
    """
# try prefixes of increasing length
    for i in xrange(1, len(word)//2 + 1):
        j = i
        while word[j] == word[j-i]:
            j += 1
            if len(word) <= j:
                return i

    return len(word)

if __name__ == "__main__":
    for word in ('catcatcat', 'catcatcatdog',
                 'abaaba',  'ababbbababbbababbbababbb',
                 'aaaaab', 'aaaaaa'):
        print shortestBase(word)

- yours compares word[0:i] to word[i:2*i] twice in a row.)

Community
  • 1
  • 1
greybeard
  • 2,249
  • 8
  • 30
  • 66
  • Thanks greybeard, read through the KMP algorithm code, and confused by this line `smallPieceLen = len(inputv) - nxt[-1]`, and `nxt[-1]` means if the whole string does not match, what index we will be used to compare next. `smallPieceLen` represents the differences total length of string and `nxt[-1]`, how it could be represented as shortest repetitive string? – Lin Ma Oct 27 '16 at 22:37
  • Hi greybeard, I read algorithm you referred, it seems different from KMP, https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm, KMP the first two elements of the table should be `-1` and `0`, but you referred post, initial elements are `0`. – Lin Ma Oct 28 '16 at 07:00
  • 1
    (Just added a comment where the variable names are in context. Would you believe someone else commented on en.wikipedia's description of KMP to a post by some Lin Ma just a year ago at CR?) – greybeard Oct 28 '16 at 09:42