You want KMP
algorithm kmp_search:
input:
an array of characters, S (the text to be searched)
an array of characters, W (the word sought)
output:
an integer (the zero-based position in S at which W is found)
define variables:
an integer, m ← 0 (the beginning of the current match in S)
an integer, i ← 0 (the position of the current character in W)
an array of integers, T (the table, computed elsewhere)
while m+i is less than the length of S, do:
if W[i] = S[m + i],
if i equals the (length of W)-1,
return m
let i ← i + 1
otherwise,
let m ← m + i - T[i],
if T[i] is greater than -1,
let i ← T[i]
else
let i ← 0
(if we reach here, we have searched all of S unsuccessfully)
return the length of S
Complexity:
Assuming the prior existence of the table T, the search portion of the
Knuth–Morris–Pratt algorithm has complexity O(k), where k is the
length of S and the O is big-O notation. As except for the fixed
overhead incurred in entering and exiting the function, all the
computations are performed in the while loop, we will calculate a
bound on the number of iterations of this loop; in order to do this we
first make a key observation about the nature of T. By definition it
is constructed so that if a match which had begun at S[m] fails while
comparing S[m + i] to W[i], then the next possible match must begin at
S[m + (i - T[i])]. In particular the next possible match must occur at
a higher index than m, so that T[i] < i. Using this fact, we will show
that the loop can execute at most 2k times. For in each iteration, it
executes one of the two branches in the loop. The first branch
invariably increases i and does not change m, so that the index m + i
of the currently scrutinized character of S is increased. The second
branch adds i - T[i] to m, and as we have seen, this is always a
positive number. Thus the location m of the beginning of the current
potential match is increased. Now, the loop ends if m + i = k;
therefore each branch of the loop can be reached at most k times,
since they respectively increase either m + i or m, and m ≤ m + i: if
m = k, then certainly m + i ≥ k, so that since it increases by unit
increments at most, we must have had m + i = k at some point in the
past, and therefore either way we would be done. Thus the loop
executes at most 2k times, showing that the time complexity of the
search algorithm is O(k).
additional information:
Efficiency of the KMP algorithm
Since the two portions of the algorithm have, respectively,
complexities of O(k) and O(n), the complexity of the overall algorithm
is O(n + k). These complexities are the same, no matter how many
repetitive patterns are in W or S.