Searching any information about std algorithms performance, I've found the Stack Overflow question about performance difference between std::max_element()
and self-written function. I've tested the functions from the question with GCC 9.2.0 and found no performance difference, i.e. my_max_element_orig()
and my_max_element_changed()
(from the accepted answer) show the same performance. So it seems like it was just an optimizer issue in GCC 4.8.2. What I really found for GCC 9.2.0 is the significant difference in case I use pointers and iterators - using iterators is about 2 times worse against raw pointers. And there is the similar difference for iterators and raw pointers if using std::max_element()
.
Let's take my_max_element_orig
function implementation (see below) and try to run the test.
template<typename _ForwardIterator>
_ForwardIterator my_max_element_orig(_ForwardIterator __first, _ForwardIterator __last)
{
if (__first == __last) return __first;
_ForwardIterator __result = __first;
while (++__first != __last)
if (*__result < *__first)
__result = __first;
return __result;
}
The following usage example
int maxValue = *my_max_element_orig(begin(vec), end(vec));
is worse than the following (raw pointers)
int maxValue = *my_max_element_orig(vec.data(), vec.data() + vec.size());
Someone may say that the reason is that iterator class implementation introduces some overhead. However I found that the reason is the presense of the following line:
if (__first == __last) return __first;
If the line above is removed from the function, then iterators show the same performance as raw pointers. After some experiments I decided to intervene to the optimizer's branch prediction and replaced the line with the following:
#define unlikely(x) __builtin_expect((x),0)
...
if (unlikely(__first == __last)) return __first;
With the changes above my_max_element_orig()
function shows the same performance regardless of using iterators or raw pointers. I made the similar change in the std::max_element()
function from std_algo.h
file, and got the same results - now std::max_element()
shows the same performance for both iterators and raw pointers.
The fact is both the original question I linked to and my finding are about "how does GCC optimizer work" or "is it an optimizer issue" questions. However I'd like to use std algorithms, and I don't want to re-write them to get more optimized code. So I'm wondering if there is a way to change branch prediction for std::max_element()
, like I did it above for my own function. Or more general, is there a way to get std algorithms more optimized without re-writing them?
- GCC 9.2.0
- SUSE Linux Enterprise Server 11 (x86_64)
g++ -DNDEBUG -O3 -Wall -fmessage-length=0 --std=c++17
- Test programs: https://godbolt.org/z/HrABJt