I am currently experimenting with a very simple Boyer-Moore variant.
In general my implementation works, but if I try to utilize it in a loop the character pointer containing the haystack gets messed up. And I mean that characters in it are altered, or mixed.
The result is consistent, i.e. running the same test multiple times yields the same screw up.
This is the looping code:
string src("This haystack contains a needle! needless to say that only 2 matches need to be found!");
string pat("needle");
const char* res = src.c_str();
while((res = boyerMoore(res, pat)))
++res;
This is my implementation of the string search algorithm (the above code calls a convenience wrapper which pulls the character pointer and length of the string):
unsigned char*
boyerMoore(const unsigned char* src, size_t srcLgth, const unsigned char* pat, size_t patLgth)
{
if(srcLgth < patLgth || !src || !pat)
return nullptr;
size_t skip[UCHAR_MAX]; //this is the skip table
for(int i = 0; i < UCHAR_MAX; ++i)
skip[i] = patLgth; //initialize it with default value
for(size_t i = 0; i < patLgth; ++i)
skip[(int)pat[i]] = patLgth - i - 1; //set skip value of chars in pattern
std::cout<<src<<"\n"; //just to see what's going on here!
size_t srcI = patLgth - 1; //our first character to check
while(srcI < srcLgth)
{
size_t j = 0; //char match ct
while(j < patLgth)
{
if(src[srcI - j] == pat[patLgth - j - 1])
++j;
else
{
//since the number of characters to skip may be negative, I just increment in that case
size_t t = skip[(int)src[srcI - j]];
if(t > j)
srcI = srcI + t - j;
else
++srcI;
break;
}
}
if(j == patLgth)
return (unsigned char*)&src[srcI + 1 - j];
}
return nullptr;
}
The loop produced this output (i.e. these are the haystacks the algorithm received):
- This haystack contains a needle! needless to say that only 2 matches need to be found!
- eedle! needless to say that only 2 matches need to be found!
- eedless to say that eed 2 meed to beed to be found!
As you can see the input is completely messed up after the second run. What am I missing? I thought the contents could not be modified, since I'm passing const
pointers.
Is the way of setting the pointer in the loop wrong, or is my string search screwing up?
Btw: This is the complete code, except for includes and the main function around the looping code.
EDIT:
The missing nullptr
of the first return was due to a copy/paste error, in the source it is actually there.
For clarification, this is my wrapper function:
inline char* boyerMoore(const string &src, const string &pat)
{
return (const char*) boyerMoore((const unsigned char*) src.c_str(), src.size(),
(const unsigned char*) pat.c_str(), pat.size());
}