5

I've been working on a Rabin-Karp string matching function in C++ and I'm not getting any results out of it. I have a feeling that I'm not computing some of the values correctly, but I don't know which one(s).

Prototype

void rabinKarp(string sequence, string pattern, int d, int q);

Function Implementation

void rabinKarp(string sequence, string pattern, int d, int q)
{
    //d is the |∑|
    //q is the prime number to use to lessen spurious hits
    int n = sequence.length(); //Length of the sequence
    int m = pattern.length(); //Length of the pattern
    double temp = static_cast<double> (m - 1.0);
    double temp2 = pow(static_cast<double> (d), temp); //Exponentiate d
    int h = (static_cast<int>(temp2)) % q; //High Order Position of an m-digit window
    int p = 0; //Pattern decimal value
    int t = 0; //Substring decimal value
    for (int i = 1; i < m; i++) { //Preprocessing
        p = (d*p + (static_cast<int>(pattern[i]) - 48)) % q;
        t = (d*t + (static_cast<int>(sequence[i])-48)) % q;
    }
    for (int s = 0; s < (n-m); s++) { //Matching(Iterate through all possible shifts)
        if (p == t) {
            for (int j = 0; j < m; j++) {
                if (pattern[j] == sequence[s+j]) {
                    cout << "Pattern occurs with shift: " << s << endl;
                }
            }
        }
        if (s < (n-m)) {
            t = (d*(t - ((static_cast<int>(sequence[s+1]) - 48)*h)) + (static_cast<int>(sequence[s + m + 1]) - 48)) % q;
        }
    }
    return;
}

In my function call I pass 2359023141526739921 as the sequence, 31415 as the pattern, 10 as the radix, and 13 as the prime. I expect there to be one actual match and one spurious hit, but I never get the output statement from the matching part of the function. What am I doing wrong?

Thanks in Advance, Madison

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Madison S
  • 225
  • 1
  • 10

2 Answers2

8

The big gotcha in coding the Rabin Karp is the modulo operator. When two numbers X and Y are congruent modulo Q then (X % Q) should equal (Y % Q) but on the C++ compiler you're using they will only be equal if X and Y are both positive or both negative. If X is positive and Y is negative then (X % Q) will be positive and (Y % Q) will negative. In fact (X % Q)-Q == (Y % Q) in this case.

The work around is to check for negative values after each modulo and if there are any to add q to the variable, so your preprocessing loop becomes :

    p = (d*p + pattern[i]) % q;
    if ( p < 0 ) p += q;
    t = (d*t + sequence[i]) % q;
    if ( t < 0 ) t += q;

t in the main loop needs to have a similar check added.

paperhorse
  • 4,095
  • 2
  • 22
  • 12
5

Unless you've redefined ^, it is computing xor, not exponentiation. Also, you should be careful about overflowing the maximum value of an int before you perform %.

jonderry
  • 23,013
  • 32
  • 104
  • 171
  • Thanks! This helped with the issue I was having with h not being correct. I was unaware that the ^ operator wasn't defined as exponentiation. Still not getting an output though :( – Madison S Dec 04 '10 at 02:46
  • I would verify that small parts of it are behaving as expected, rather than attempting to get everything to work at once. This will help you find your bugs one by one. – jonderry Dec 04 '10 at 03:46
  • Stepping through with GDB has let me to the culprit: Recalculating t in the second for loop is resulting in negative numbers. Everything else is working as intended as far as I can tell. – Madison S Dec 04 '10 at 04:00