3

Apologies for this probable repeated question.

I am trying to use rolling hash with Karp Rabin.I have looked a different implementations of rolling hash and I am wondering where I am going wrong. Though the text has the pattern the match using hash does not seem to be occurring at all. Attaching (a part of)code for calculating the hash and the search.

long hash(char* key, int len) {
int j = 0;
unsigned long long h = 0;
for (j = 0; j < len; j++) {
    h = h * PRIME_BASE + key[j];
    h %= PRIME_MOD;
}
return h;
}



int search(char* pattern, char *txt, int textLength, int patternLength) {

int i, val = 0;

long long txtHash=0;

long power = 1;
for (i = 0; i < patternLength; i++)
    power = (power * PRIME_BASE) % PRIME_MOD;
i=0;
printf(" the value of power is %ld ",power);
for (i = 0; i < textLength; i++) {
    txtHash = txtHash * PRIME_BASE + txt[i];
    txtHash %= PRIME_MOD;
    if (i >= patternLength)
    {
    txtHash -= power * txt[i - patternLength] % PRIME_MOD;

    if (txtHash < 0){
      //negative can be made positive with mod
        txtHash += PRIME_MOD;
    }
    }
    int offset=0;
    if(i>=patternLength){
    offset=i-patternLength+1;
    }
    else{
        offset=0;
    }

    if (patHash == txtHash) {
        if (check(pattern, txt, offset, patternLength)) {
            val++;
        }
    }

}
if (val > 0) {
    return val;
}
// no match
return 0;
}


bool check(char* pattern, char* txt, int k, int M) {
int j = 0;

for (j = 0; j < M; j++) {
    if (pattern[j] != txt[k + j]) {
        return false;
    }
}
return true;
}

I had the problem of the buffer overflow, which I have dealt with but the pattern and text hash do not seem to be matching for a protein sequence text string( with 1000 characters) and 17 characters pattern Any ideas where I could be going wrong ?

Thanks, Bhavya

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
bhavs
  • 2,091
  • 8
  • 36
  • 66
  • instead of a rolling hash I used computed the hash of the text by calling the hash function repeatedly and that seems to be working, but of course the performance as is bad as a brute force algorithm – bhavs Oct 19 '11 at 18:59

1 Answers1

0

I spent some more time on this problem and found that because I had initialised the value of long long txtHash to some default value I was facing the situation wherein the hashes were not matching. updating the code above with the fix

bhavs
  • 2,091
  • 8
  • 36
  • 66