5

I was understanding the Rabin-Karp Algorithm from this website: https://www.geeksforgeeks.org/rabin-karp-algorithm-for-pattern-searching/

They had the following code in C++ for the algorithm:

#include <bits/stdc++.h> 
using namespace std; 
  
// d is the number of characters in the input alphabet  
#define d 256  
  
/* pat -> pattern  
    txt -> text  
    q -> A prime number  
*/
void search(char pat[], char txt[], int q)  
{  
    int M = strlen(pat);  
    int N = strlen(txt);  
    int i, j;  
    int p = 0; // hash value for pattern  
    int t = 0; // hash value for txt  
    int h = 1;  
  
    // The value of h would be "pow(d, M-1)%q"  
    for (i = 0; i < M - 1; i++)  
        h = (h * d) % q;  
  
    // Calculate the hash value of pattern and first  
    // window of text  
    for (i = 0; i < M; i++)  
    {  
        p = (d * p + pat[i]) % q;  
        t = (d * t + txt[i]) % q;  
    }  
  
    // Slide the pattern over text one by one  
    for (i = 0; i <= N - M; i++)  
    {  
  
        // Check the hash values of current window of text  
        // and pattern. If the hash values match then only  
        // check for characters on by one  
        if ( p == t )  
        {  
            /* Check for characters one by one */
            for (j = 0; j < M; j++)  
            {  
                if (txt[i+j] != pat[j])  
                    break;  
            }  
  
            // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]  
            if (j == M)  
                cout<<"Pattern found at index "<< i<<endl;  
        }  
  
        // Calculate hash value for next window of text: Remove  
        // leading digit, add trailing digit  
        if ( i < N-M )  
        {  
            t = (d*(t - txt[i]*h) + txt[i+M])%q;  
  
            // We might get negative value of t, converting it  
            // to positive  
            if (t < 0)  
            t = (t + q);  
        }  
    }  
}  
  
/* Driver code */
int main()  
{  
    char txt[] = "GEEKS FOR GEEKS";  
    char pat[] = "GEEK"; 
        
      // A prime number  
    int q = 101;  
      
      // Function Call 
      search(pat, txt, q);  
    return 0;  
}  

What I didn't understand was this block of code:

            // We might get negative value of t, converting it  
            // to positive  
            if (t < 0)  
            t = (t + q);  

How can t ever be negative? What we subtract from t is always less than t and then we add something to it so where does the possibility of t beign negative even come from?

I tested the code without this if statement and it didn't work properly. The exepected output was :

Pattern found at index 0
Pattern found at index 10

But I got:

Pattern found at index 0
Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
Arsh Sharma
  • 1,129
  • 1
  • 7
  • 10
  • 1
    Forget that website. It demonstrates how *not* to write C++ code and has nothing to do with professional programming. [Why should I not `#include `?](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h) [Why is `using namespace std;` considered bad practice?](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice) – Evg Feb 24 '21 at 07:10
  • Maybe if you indent the second line of the code it becomes clearer? Why shouldn't `t` be negative? You can also set a breakpoint in that line to see when it gets triggered. – Ulrich Eckhardt Feb 24 '21 at 07:15
  • https://stackoverflow.com/questions/7594508/modulo-operator-with-negative-values – Aki Suihkonen Feb 24 '21 at 09:14

2 Answers2

1

Aki Suihkonen has it; with a positive modulus, the result is either zero or has the same sign as the dividend, whereas Rabin--Karp assumes that the result will always be nonnegative.

For example, if we do

t = 3
t = (t + 5) % 7
t = (t - 5) % 7

then the values are

(3 + 5) % 7 == 1
(1 - 5) % 7 == -4

which if we add 7 makes 3 as desired.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
0

t can indeed be negative.

  1. As we are taking modulo q in each step, the value of t can be between 0 to q-1.
  2. So, txt[i]*h can be greater than the old value of t. Making (t - txt[i]*h) negative.
  3. Which can possibly result in (d*(t-txt[i]*h)+txt[i+M]) being negative.
  4. In C++, when we try to perform modulus on negative numbers, it returns a negative value. For example,
(-5) % 3 = -2
  1. As a result, if (d*(t-txt[i]*h)+txt[i+M]) is negative, (d*(t-txt[i]*h)+txt[i+M])%q will also be negative. Thus, resulting in t (new) being negative.

[Note that, the modulo operation is handled differently in different language. For example, in Python, the modulo of a negative number gives a positive and valid result. So, we don't need to handle this case in Python.]

For better insight, you can run this slightly modified code that shows the value of each part of the calculation for each iteration i.

Code


#include <bits/stdc++.h> 
using namespace std; 
  
// d is the number of characters in the input alphabet  
#define d 256  
  
/* pat -> pattern  
    txt -> text  
    q -> A prime number  
*/
void search(char pat[], char txt[], long long int q)  
{  
    long long int M = strlen(pat);  
    long long int N = strlen(txt);  
    long long int i, j;  
    long long int p = 0; // hash value for pattern  
    long long int t = 0; // hash value for txt  
    long long int h = 1;  
  
    // The value of h would be "pow(d, M-1)%q"  
    for (i = 0; i < M - 1; i++)  
        h = (h * d) % q;  
  
    // Calculate the hash value of pattern and first  
    // window of text  
    for (i = 0; i < M; i++)  
    {  
        p = (d * p + pat[i]) % q;  
        t = (d * t + txt[i]) % q;  
    }  
  
    // Slide the pattern over text one by one  
    for (i = 0; i <= N - M; i++)  
    {  
  
        // Check the hash values of current window of text  
        // and pattern. If the hash values match then only  
        // check for characters on by one  
        if ( p == t )  
        {  
            /* Check for characters one by one */
            for (j = 0; j < M; j++)  
            {  
                if (txt[i+j] != pat[j])  
                    break;  
            }  
  
            // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]  
            if (j == M)  
                cout<<"[Pattern Found] at index "<< i<<endl;  
        }  
  
        // Calculate hash value for next window of text: Remove  
        // leading digit, add trailing digit
        if ( i < N-M )  
        {  
            cout << "i = " << i << endl;
            cout << "------------------------------------------" << endl;
            cout << "t                          :" << t << endl;
            cout << "txt[i]*h                   :" << txt[i]*h << endl;
            cout << "t-txt[i]*h                 :" << t - txt[i]*h << endl;
            cout << "d*(t-txt[i]*h)             :" << d*(t - txt[i]*h) << endl;
            cout << "(d*(t-txt[i]*h)+txt[i+M])  :" << (d*(t - txt[i]*h) + txt[i+M]) << endl;
            cout << "(d*(t-txt[i]*h)+txt[i+M])%q:" << (d*(t - txt[i]*h) + txt[i+M])%q << endl;

            t = (d*(t - txt[i]*h) + txt[i+M])%q;  

            // We might get negative value of t, converting it  
            // to positive
            if (t < 0)  
                t = (t + q);  

            cout << "new_t                      :" << t << endl;
            cout << "------------------------------------------" << endl;
        }  
    }  
}  
  
/* Driver code */
int main()  
{  
    char txt[] = "GEEKS FOR GEEKS";  
    char pat[] = "GEEK"; 
        
      // A prime number  
    long long int q = 101;  
      
      // Function Call 
      search(pat, txt, q);  
    return 0;  
}  

Output

[Pattern Found] at index 0
i = 0
------------------------------------------
t                          :27
txt[i]*h                   :355
t-txt[i]*h                 :-328
d*(t-txt[i]*h)             :-83968
(d*(t-txt[i]*h)+txt[i+M])  :-83885
(d*(t-txt[i]*h)+txt[i+M])%q:-55
new_t                      :46
------------------------------------------
i = 1
------------------------------------------
t                          :46
txt[i]*h                   :345
t-txt[i]*h                 :-299
d*(t-txt[i]*h)             :-76544
(d*(t-txt[i]*h)+txt[i+M])  :-76512
(d*(t-txt[i]*h)+txt[i+M])%q:-55
new_t                      :46
------------------------------------------
i = 2
------------------------------------------
t                          :46
txt[i]*h                   :345
t-txt[i]*h                 :-299
d*(t-txt[i]*h)             :-76544
(d*(t-txt[i]*h)+txt[i+M])  :-76474
(d*(t-txt[i]*h)+txt[i+M])%q:-17
new_t                      :84
------------------------------------------
i = 3
------------------------------------------
t                          :84
txt[i]*h                   :375
t-txt[i]*h                 :-291
d*(t-txt[i]*h)             :-74496
(d*(t-txt[i]*h)+txt[i+M])  :-74417
(d*(t-txt[i]*h)+txt[i+M])%q:-81
new_t                      :20
------------------------------------------
i = 4
------------------------------------------
t                          :20
txt[i]*h                   :415
t-txt[i]*h                 :-395
d*(t-txt[i]*h)             :-101120
(d*(t-txt[i]*h)+txt[i+M])  :-101038
(d*(t-txt[i]*h)+txt[i+M])%q:-38
new_t                      :63
------------------------------------------
i = 5
------------------------------------------
t                          :63
txt[i]*h                   :160
t-txt[i]*h                 :-97
d*(t-txt[i]*h)             :-24832
(d*(t-txt[i]*h)+txt[i+M])  :-24800
(d*(t-txt[i]*h)+txt[i+M])%q:-55
new_t                      :46
------------------------------------------
i = 6
------------------------------------------
t                          :46
txt[i]*h                   :350
t-txt[i]*h                 :-304
d*(t-txt[i]*h)             :-77824
(d*(t-txt[i]*h)+txt[i+M])  :-77753
(d*(t-txt[i]*h)+txt[i+M])%q:-84
new_t                      :17
------------------------------------------
i = 7
------------------------------------------
t                          :17
txt[i]*h                   :395
t-txt[i]*h                 :-378
d*(t-txt[i]*h)             :-96768
(d*(t-txt[i]*h)+txt[i+M])  :-96699
(d*(t-txt[i]*h)+txt[i+M])%q:-42
new_t                      :59
------------------------------------------
i = 8
------------------------------------------
t                          :59
txt[i]*h                   :410
t-txt[i]*h                 :-351
d*(t-txt[i]*h)             :-89856
(d*(t-txt[i]*h)+txt[i+M])  :-89787
(d*(t-txt[i]*h)+txt[i+M])%q:-99
new_t                      :2
------------------------------------------
i = 9
------------------------------------------
t                          :2
txt[i]*h                   :160
t-txt[i]*h                 :-158
d*(t-txt[i]*h)             :-40448
(d*(t-txt[i]*h)+txt[i+M])  :-40373
(d*(t-txt[i]*h)+txt[i+M])%q:-74
new_t                      :27
------------------------------------------
[Pattern Found] at index 10
i = 10
------------------------------------------
t                          :27
txt[i]*h                   :355
t-txt[i]*h                 :-328
d*(t-txt[i]*h)             :-83968
(d*(t-txt[i]*h)+txt[i+M])  :-83885
(d*(t-txt[i]*h)+txt[i+M])%q:-55
new_t                      :46
------------------------------------------
smmehrab
  • 756
  • 9
  • 16