There is clang-tidy option performance-faster-string-find
that detect the use of the std::basic_string::find
method (and related ones) with a single character string literal as argument. According to them, the use of a character literal is more efficient.
I wanted to perform a little benchmark to test that. Therefore, I made this little program:
#include <string>
#include <chrono>
#include <iostream>
int main() {
int res = 0;
std::string s(STRING_LITERAL);
auto start = std::chrono::steady_clock::now();
for(int i = 0; i < 10000000; i++) {
#ifdef CHAR_TEST
res += s.find('A');
#else
res += s.find("A");
#endif
}
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
return res;
}
Two macros are used in this program:
STRING_LITERAL
which will be the content of thestd::string
on which we will call thefind
function. On my benchmark, this macro can have two values: a small string, let's say"BAB"
or a long string, let's say"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"
,CHAR_TEST
, if defined, run the benchmark for character literal. If not,find
is called with single character string literal.
Here are the results:
> (echo "char with small string" ; g++ -DSTRING_LITERAL=\"BAB\" -DCHAR_TEST -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "string literal with small string" ; g++ -DSTRING_LITERAL=\"BAB\" -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "char with long string" ; g++ -DSTRING_LITERAL=\"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\" -DCHAR_TEST -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "string literal with long string" ; g++ -DSTRING_LITERAL=\"BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB\" -O3 -o toy_exe toy.cpp && ./toy_exe)
char with small string
elapsed time: 0.0551678s
string literal with small string
elapsed time: 0.0493302s
char with long string
elapsed time: 0.0599704s
string literal with long string
elapsed time: 0.188888s
My quite ugly command runs the benchmark for the four possible combinations of the macros and I found, with a long std::string
, it is indeed more efficient to use a character literal as argument to find
but it is no longer true for small std::string
. I repeated the experiment and I always find an increase of around 10% of the execution time for character literal with small std::string
.
In parallel, one of my workmates made some benchmarks on quick-bench.com and found the following results:
- Small
std::string
with character literal: 11 units of time - Small
std::string
with single character string literal: 20 units of time - Long
std::string
with character literal: 13 units of time - Long
std::string
with single character string literal: 22 units of time
These results are coherent with what claims clang-tidy (and sounds logical). So, what is wrong with my benchmark? Why have I consistent wrong results?
EDIT: This benchmark has been performed using GCC 6.3.0 on Debian. I also run it using Clang 8.0.0 for similar results.