1

The task of the program is to check if a string s2 is a substring of another string (s1 + s1) given s1 and s2 which are equal in length.

For example: [s1, s2] = ["abc", "bca"] should return true while [s1, s2] = ["abc", "bac"] should return false.

and the limits of the length of both strings is 10^5.

using (s1+s1).find(s2) == string::npos took about .1 second to finish.

I implemented it in a naive approach with a complexity of O(n*m) and it took 30 seconds!!

s1 = s1 + s1;
bool f = 0;
for(int i = 0;i < s1.size() - s2.size() + 1;i++){
    f = 1;
    for(int j = 0;j < s2.size();j++){
        if(s1[i+j] != s2[j]){
            f = 0;
            break;
        }
    }
    if(f)
        break;
}
//the answer is f

So I thought that C++ used an algorithm like KMP but the I found out that it is implementation-defind and GNU gcc used only the naive approach with some improvements.

But that's not even the biggest problem. when I replaced the inner loop with s1.compare(i, s2.size(), s2), it took about the same time as STL find method .1 second.

bool f = 0;
for(int i = 0;i < s1.size() - s2.size() + 1;i++){
    if(s1.compare(i, s2.size(), s2) == 0){
        f = 1;
        break;
    }

}

So does C++ compilers implement compare in a different way?

And what is the improvements of the method used by C++ compilers to surpass the naive approach by 300 times while using the same complexity?

NOTE: The test I used for the previous codes is

s1 = "ab"*50000 + "ca"

s2 = "ab"*50000 + "ac"

Omar Elawady
  • 3,300
  • 1
  • 13
  • 17
  • It depends a lot on how `string::find()` is implemented. I would look at the following: https://stackoverflow.com/questions/3183582/what-is-the-fastest-substring-search-algorithm –  Aug 26 '17 at 21:20
  • @Frank I searched a lot to find out how is it implemented in gcc. But I couldn't find it in a standard or something like that and I didn't understand the source code. But no one -I found- said it used an advanced algorithm, only the naive approach with improvements. – Omar Elawady Aug 26 '17 at 21:30
  • 3
    Well, yes. Runtime libraries implement `compare` more efficiently than the naive approach. The test data you supplied is nearly a worst case for the naive algorithm, and almost a best case for things like Boyer-Moore and Knuth-Morris-Pratt. Because `s2` ends with `c`, and there are only two instances of `c` in `s1`, those algorithms end up only having to do two string comparisons. – Jim Mischel Aug 26 '17 at 21:31
  • 1
    If you're looking for source, see https://stackoverflow.com/questions/3170295/where-can-i-find-the-implementation-for-stdstring – Jim Mischel Aug 26 '17 at 21:32
  • 3
    you're probably running the debug build. switch to the release/optimized configuration and try again. – Matt Timmermans Aug 26 '17 at 21:32
  • I can't reproduce your benchmark – amchacon Aug 26 '17 at 21:33
  • @MattTimmermans It seems that this indeed the problem I compiled it using `g++ --std=c++11 -O2 ./main.cpp` and it took only 3 seconds. – Omar Elawady Aug 26 '17 at 21:44
  • 1
    A possible reason is that the STL implementation employs `memcmp`, which, like many other C memory functions, are heavily optimized and often written in assembly. Your naive method compares one character at a time, which wastes a lot of memory bandwidth; any optimized method would load multiple characters and compare simultaneously – meowgoesthedog Aug 26 '17 at 21:53
  • @meowgoesthedog Thanks for pointing that out. I looked at the source and it doesn't use Knuth-Morris-Pratt algorithm nor Boyer-Moore [source](https://pastebin.com/sday9X8K), Or there's something I am missing? – Omar Elawady Aug 26 '17 at 22:18
  • 1
    It is just a simple, character-by-character comparison. Nothing to do with KMP as such. – meowgoesthedog Aug 26 '17 at 22:22
  • @amchacon I used this [code](https://ideone.com/i8j1um) – Omar Elawady Aug 26 '17 at 22:35

1 Answers1

1

As answered in the comments above.

The program was run in a non-optimized debug build and the time reduced to only 3 seconds after switching to release mode.

The remaining difference might be because the runtime library uses some method like memcmp which is heavily optimized compared to looping and comparing characters one by one.

Omar Elawady
  • 3,300
  • 1
  • 13
  • 17