Why my performance benchmark gives me wrong results?

Question

There is clang-tidy option performance-faster-string-find that detect the use of the std::basic_string::find method (and related ones) with a single character string literal as argument. According to them, the use of a character literal is more efficient.

I wanted to perform a little benchmark to test that. Therefore, I made this little program:

#include <string>
#include <chrono>
#include <iostream>

int main() {
    int res = 0;
    std::string s(STRING_LITERAL);

    auto start = std::chrono::steady_clock::now();

    for(int i = 0; i < 10000000; i++) {
#ifdef CHAR_TEST
        res += s.find('A');
#else  
        res += s.find("A");
#endif
    }

    auto end = std::chrono::steady_clock::now();

    std::chrono::duration<double> elapsed_seconds = end-start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

    return res;
}

Two macros are used in this program:

STRING_LITERAL which will be the content of the std::string on which we will call the find function. On my benchmark, this macro can have two values: a small string, let's say "BAB" or a long string, let's say "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB",
CHAR_TEST, if defined, run the benchmark for character literal. If not, find is called with single character string literal.

Here are the results:

> (echo "char with small string" ; g++ -DSTRING_LITERAL=\"BAB\" -DCHAR_TEST -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "string literal with small string" ; g++ -DSTRING_LITERAL=\"BAB\" -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "char with long string" ; g++ -DSTRING_LITERAL=\"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\" -DCHAR_TEST -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "string literal with long string" ; g++ -DSTRING_LITERAL=\"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\" -O3 -o toy_exe toy.cpp && ./toy_exe)

char with small string
elapsed time: 0.0551678s
string literal with small string
elapsed time: 0.0493302s

char with long string
elapsed time: 0.0599704s
string literal with long string
elapsed time: 0.188888s

My quite ugly command runs the benchmark for the four possible combinations of the macros and I found, with a long std::string, it is indeed more efficient to use a character literal as argument to find but it is no longer true for small std::string. I repeated the experiment and I always find an increase of around 10% of the execution time for character literal with small std::string.

In parallel, one of my workmates made some benchmarks on quick-bench.com and found the following results:

Small std::string with character literal: 11 units of time
Small std::string with single character string literal: 20 units of time
Long std::string with character literal: 13 units of time
Long std::string with single character string literal: 22 units of time

These results are coherent with what claims clang-tidy (and sounds logical). So, what is wrong with my benchmark? Why have I consistent wrong results?

EDIT: This benchmark has been performed using GCC 6.3.0 on Debian. I also run it using Clang 8.0.0 for similar results.

@mlc In all of my tests, I used the `-O3` optimization level. The same is used for the benchmarks made on quick-bench.com. — Pierre, May 28 '20 at 09:13
I assume that you also run that in the same compiler version and with the same c++ standard? — mlc, May 28 '20 at 09:27
@mlc Yes, you can see the command in my question: `g++ -DSTRING_LITERAL=\"BAB\" -DCHAR_TEST -O3 -o toy_exe toy.cpp`. However, I also made some test with clang and with `-std=c++11` and observe similar results. — Pierre, May 28 '20 at 09:31
Looking at the code at quick-bench I see that they use ```benchmark::DoNotOptimize()```. It seems that compiler makes different optimizations for you and for the code that is on quick-bench. See: https://github.com/google/benchmark#preventing-optimization. — mlc, May 28 '20 at 11:08
@mlc Thanks for pointing that. I observed that if I remove all optimizations in my benchmarks, the use of a character literal is indeed faster, for small and long strings. But it would be surprising that clang-tidy has a check that is fully valid only for non optimized code. — Pierre, May 28 '20 at 12:14
Has anyone considered the a std::string with one character is not a simply a character but rather a class which manages strings? https://stackoverflow.com/questions/1287306/difference-between-string-and-char-types-in-c — jiveturkey, Jun 03 '20 at 13:54
I believe there some situations can be guessed at compile time. Instead of searching a constant inside another constant, try to enter variable with cin or read it from a file instead of just 'A' or "A" or "BBBBBBABBBBBBBBBB". Also consider declaring it volatile, so the compiler will not apply optimisations. — armagedescu, Jun 08 '20 at 10:38
@armagedescu I already did that and it gives me the same results. — Pierre, Jun 08 '20 at 12:13

score 3 · Answer 1 · answered Jun 08 '20 at 15:55

I did not like the idea of controlling the program via macros, so I rewrote it into this:

#include <string>
#include <chrono>
#include <iostream>

template <typename T>
int test(std::string s, T pattern, const std::string & msg, size_t num_repeat)
{
  int res = 0;
  auto start = std::chrono::steady_clock::now();

  for(int i = 0; i < num_repeat; i++) 
  {
    s.find(pattern);
    s[0] = '.';
  }

  auto end = std::chrono::steady_clock::now();
  std::chrono::duration<double> elapsed_seconds = end-start;
  std::cout << msg << " elapsed time: " << elapsed_seconds.count() << "s\n";

  return res;

}


int main(int argc, const char* argv[]) 
{
  const int N = 10'000'000;
  int res = 0;
  std::string s = (argc == 1) ? "MNBVCXZLKJHGFDSAPOIUYTREWQ" : argv[1];

  res += test(s, 'A', s + ".find(char): ", N);
  res += test(s, "A", s + ".find(string): ", N);

  return res & 1;
}

The main idea was to fool the compiler enough to make it abandon any idea of optimizing things out (that's the purpose of s[1] = '.' and reading s from commandline). I wanted to avoid the situation where the compiler knowss both the string and the pattern searched for, as this could let it use some optimization tricks we do not want to take int account.

I compiled it using gcc 10.1.0 and clang 10.0.0, with -O3 as the only command-line option. (g++ was run with -std=c++17, I have it aliased).

The results depend on the compiler (which can be already observed in the benchmark linked in the question!)

Ok. small strings, g++:

pA1.find(char):  elapsed time: 0.124409s
pA1.find(string):  elapsed time: 0.125372s

clang:

pA1.find(char):  elapsed time: 0.122489s
pA1.find(string):  elapsed time: 0.126854s

The difference is hardly measurable. clang yields systematically larger times for strings, but this is typically on the 3rd significant digit, hardly worth mentioning.

Now medium-size strings, g++:

00000000000000000000000000000000000000000000000pA1.find(char):  elapsed time: 0.139219s
00000000000000000000000000000000000000000000000pA1.find(string):  elapsed time: 0.137838s

clang:

00000000000000000000000000000000000000000000000pA1.find(char):  elapsed time: 0.13962s
00000000000000000000000000000000000000000000000pA1.find(string):  elapsed time: 0.153506s

The ressult for clang are systematically in favor of "char" method; as for g++, the winner fluctuates.

Now even larger strings, g++:

111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000pA1.find(char):  elapsed time: 0.170651s
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000pA1.find(string):  elapsed time: 0.177381s

clang:

111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000pA1.find(char):  elapsed time: 0.172215s
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000pA1.find(string):  elapsed time: 0.206911s

For g++ the difference hardly starts to be possible to observe, it is within expected range of random fluctuations. For clang the difference is clear & systematic.

I repeated it with a string composed of approx 1000 chars. For g++ there's no difference, for clang it's about 10%.

So, my conclusion is that all depends on the compiler. For clang it is reasonable to follow the advice issued by clang-tidy. For g++, this need not be the case.

This answer is not complete, though, for it would be interesting to know the differences in implementation of std::string::find between clang and g++.

score 0 · Answer 2 · answered Jun 03 '20 at 13:23

0

I am not sure something is wrong with your bench marking. I run the exact same code on repl.io platform, and I get results that are matching "quick bench":

char with small string .elapsed time: 0.402103s

string literal with small string elapsed time: 0.489828s

char with long string elapsed time: 0.400224s

string literal with long string elapsed time: 0.53304s

There is one thing that comes into mind, your profiling is done over the loop, I would profile just whats in the loop.

answered Jun 03 '20 at 13:23

Yitshak Yarom

84
1
4

What compiler did you use? And what version? I added the versions I used in my question. Also, did you try to run the benchmark on a physical machine? – Pierre Jun 04 '20 at 08:14
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 - I didn't try running it on physical machine – Yitshak Yarom Jun 04 '20 at 14:23
Unfortunately, I don't have this version of g++ to check if I have similar results. I am wondering if the results depends of the compiler and its version. – Pierre Jun 04 '20 at 15:26

Why my performance benchmark gives me wrong results?

2 Answers2