5

I'm trying to understand why the worst case running time of the Rabin-Karp algorithm is O(nm) and the average case is O(n+m).

Can someone help me with that?

Saeid
  • 4,147
  • 7
  • 27
  • 43
user6812711
  • 45
  • 1
  • 2
  • This is a duplicate of http://stackoverflow.com/questions/39054972/rabin-karp-algorithm-how-is-the-worst-case-omn-for-the-given-input but SO won't let me use that question because the answer was neither accepted nor upvoted. – rici Sep 09 '16 at 16:53

2 Answers2

6

Wiki explains about the time complexity of the algorithm pretty well.

It can be said that the effectiveness (read it as ability to dynamically reuse already calculated hash value in constant time) of the hash computation function is a deciding factor during the calculation of time complexity of the algorithm.

Let us see how the hash computation makes this difference.


Time complexity is O(nm) for cases when :

call hash(s[1..m])                  // O(m) additive
for index from 1 to n-m+1           // O(n)
    //Code to check if substring matches
    call hash(s[index+1..index+m])  // Inefficient hash function, takes O(m), just like naive string matching

Compared to O(nm), additive O(m) is largely ignored.

Giving, O(m) + O(n)*O(m) = O(nm)


Time complexity is O(n+m) for cases when :

call hash(s[1..m])                  // O(m) additive
for index from 1 to n-m+1           // O(n)
    //Code to check if substring matches
    call hash(s[index+1..index+m])  //Efficient hash function which takes only O(1), applies Rolling Hashing

Giving, O(m) + O(n)*O(1) = O(m) + O(n) = O(m+n)

Saurav Sahu
  • 13,038
  • 6
  • 64
  • 79
5

Rabin-Karp is worst-case O(nm) because it could find a false positive at every point (of which there are n) and it can require up to m comparisons to verify the match, since you need to actually compare the strings.

With an even half-reasonable hash function that shouldn't ever happen, but for pretty well any hash function it is possible to craft a query (that is, both the string and the substring being searched for) which exhibit the above pathological behaviour.

Hence, although R-K has expected time complexity of O(n), it's worst case has time complexity O(nm). (Note: since m must be no greater than n, n + m is bounded by 2n and thus O(n + m) is the same as O(n).)

It is easier to produce O(nm) behaviour if the problem is to find all the matching substrings, which is another context in which R-K is often used. In that case, searching for a substring consisting of m as in a string consisting of n as would definitely take nm time, since the substring needs to be matched at every point in the source string.

Other algorithms exist for finding all substrings which are still linear in n, even in pathological cases.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the help! Now I understand the worst case running time complexity..but I still don't understand the average case of this algorithm..can u please explain more about that? – user6812711 Sep 11 '16 at 03:24
  • @user6812711: The scenario I outline is extremely rare. Normally, there are few or no false positives, and the few false positives are identified by looking at only the first character or two. So almost always the time it takes is O(n) to find the correct string and O(m) to verify that it's correct. As I explain in the answer, since `m – rici Sep 11 '16 at 03:39