10

I am having trouble understanding how the rolling hash algorithm works after the hash has been reduced to modulus value by dividing by a prime number.

Consider the sequence of 5 digits in the number 123456.

The first chunk is 12345. I store the value, in the next window, 6 comes in and 1 goes out.

So the new hash will be (12345-1*10^4)*10 + 6 = 23456. This is fairly intuitive.

As obvious, these numbers are large, so we need a modulus function to keep them small. Suppose I take 101 as the prime number for this purpose.

So 12345 will reduce to 23. How then, from this, will I derive the rolling hash for the next window, 23456?

SexyBeast
  • 7,913
  • 28
  • 108
  • 196

2 Answers2

6

You calculate it the same way you calculate 23456, but always with modulo 101.

(((23 - (10^4 mod 101))*10) mod 101 + 6) mod 101 = 24.

which is the value you want because 23456 mod 101 = 24.

dejvuth
  • 6,986
  • 3
  • 33
  • 36
2

Answer by @dejvuth is right - i would add this specifically when doing rabin-karp is that sometimes you may end up with a -ve modulus value - in that case it is better to take the +ve equivalent of that modulus value - so that checking if the same modulus was seen before is easier.

For example: with this pattern "abcdabc" - and hash function: hash(i) = (49*S[i]+7*S[i+1]+1*S[i+2])%1123

Results in:

"abc" -> 1046
"bcd" -> 1103
"cda" -> 33
"dab" -> 62
"abc" -> -77

second occurence of "abc" results is -77 that is modulo equivalent of 1046 since (-77 + 1123 = 1046)

PS: i don't have enough "reputation" at this time to add this as a comment..

Ravi
  • 68
  • 5
  • I cannot thank you enough for this answer. I do not know modular arithmetic yet, so I wasted 2 hours on basically rectifying what was not broken. – Poseidon Broger Jan 06 '21 at 22:20