40

I have generated an string using the following alphabet. {A,C,G,T}. And my string contains more than 10000 characters. I'm searching the following patterns in it.

  • ATGGA
  • TGGAC
  • CCGT

I have asked to use a string matching algorithm which has O(m+n) running time.

m = pattern length
n = text length

Both KMP and Rabin-Karp algorithms have this running time. What is the most suitable algorithm (between Rabin-Carp and KMP) in this situation?

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
Sukeshini
  • 1,241
  • 2
  • 23
  • 48
  • If you already have some code implemented for either or both, you may also want to post this in codereview.stackexchange.com – shree.pat18 Apr 28 '14 at 09:05
  • Thanks for the quick response. I have developed up to generating the string. I want to verify what is the algorithm to be used. Then only I can continue with the development – Sukeshini Apr 28 '14 at 09:14
  • 2
    Rabin-Karp is `O(n*m)` (worst case). – Michael Foukarakis Apr 28 '14 at 09:28
  • 2
    Have you thought of Aho-Corasick? It's very close to your requirement of `O(m+n)`, is a good choice for matching multiple patterns, and is easily parallelizable. – Michael Foukarakis Apr 28 '14 at 09:37
  • 2
    @Michael Foukarakis: Thanks for the suggestion. But I want asked to select one algorithm between these two only. – Sukeshini Apr 28 '14 at 10:24

1 Answers1

39

When you want to search for multiple patterns, typically the correct choice is to use Aho-Corasick, which is somewhat a generalization of KMP. Now in your case you are only searching for 3 patterns so it may be the case that KMP is not that much slower(at most three times), but this is the general approach.

Rabin-Karp is easier to implement if we assume that a collision will never happen, but if the problem you have is a typical string searching KMP will be more stable no matter what input you have. However, Rabin-Karp has many other applications, where KMP is not an option.

jpp
  • 159,742
  • 34
  • 281
  • 339
Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176
  • 11
    In this particular case your strings are pretty small so you can compute perfect hash, avoiding collisions(with a slight modification of the algorithm). Thus I think both approaches will work. If the search patterns can grow longer this would not be possible. My answer aims at explaining the general logic for similar problems. For this problem I think both approaches are equally good. Maybe you could benchmark the two solutions and choose the better performing one? – Ivaylo Strandjev Apr 29 '14 at 09:58
  • 1
    Thanks. What are the applications in "However Rabin-Karp has many other applications, where KMP is not an option"? What do you mean by stable in "a typical string searching KMP will be more stable "? – Tim Oct 20 '16 at 22:51
  • 4
    @Tim Rabin Karp is dependent on the choice of the hash function and no matter what function you choose there will be cases where the performance degrades because of the collisions. KMP does not have this drawback and this is what I meant by "more stable"(maybe the phrase is not the best suited to use in this context). I have used Rabin-Karp to solve many different problems, but here are some other applications: it can be used to solve maximum sub-palindrome problem(there are other approaches too), I have used it to find the longest sub string that repeated generates a given input string. – Ivaylo Strandjev Oct 21 '16 at 07:39
  • @IvayloStrandjev or anyone can give scenarios/problems where Rabin Karp will be completely suitable? – Saheel Sapovadia Nov 16 '21 at 05:37
  • I have used it for many different problems. For instance using rabin-karp you can compute the longest palindromic substring in O(n*log(n)) (combining two rabin karps for the two directions and binary search). – Ivaylo Strandjev Nov 16 '21 at 09:09