Is there a way to optimize std algorithms?

Question

Searching any information about std algorithms performance, I've found the Stack Overflow question about performance difference between std::max_element() and self-written function. I've tested the functions from the question with GCC 9.2.0 and found no performance difference, i.e. my_max_element_orig() and my_max_element_changed() (from the accepted answer) show the same performance. So it seems like it was just an optimizer issue in GCC 4.8.2. What I really found for GCC 9.2.0 is the significant difference in case I use pointers and iterators - using iterators is about 2 times worse against raw pointers. And there is the similar difference for iterators and raw pointers if using std::max_element().

Let's take my_max_element_orig function implementation (see below) and try to run the test.

template<typename _ForwardIterator>
_ForwardIterator my_max_element_orig(_ForwardIterator __first, _ForwardIterator __last)
{
  if (__first == __last) return __first;
  _ForwardIterator __result = __first;
  while (++__first != __last)
    if (*__result < *__first)
      __result = __first;
  return __result;
}

The following usage example

int maxValue = *my_max_element_orig(begin(vec), end(vec));

is worse than the following (raw pointers)

int maxValue = *my_max_element_orig(vec.data(), vec.data() + vec.size());

Someone may say that the reason is that iterator class implementation introduces some overhead. However I found that the reason is the presense of the following line:

if (__first == __last) return __first;

If the line above is removed from the function, then iterators show the same performance as raw pointers. After some experiments I decided to intervene to the optimizer's branch prediction and replaced the line with the following:

#define unlikely(x)     __builtin_expect((x),0)
...
if (unlikely(__first == __last)) return __first;

With the changes above my_max_element_orig() function shows the same performance regardless of using iterators or raw pointers. I made the similar change in the std::max_element() function from std_algo.h file, and got the same results - now std::max_element() shows the same performance for both iterators and raw pointers.

The fact is both the original question I linked to and my finding are about "how does GCC optimizer work" or "is it an optimizer issue" questions. However I'd like to use std algorithms, and I don't want to re-write them to get more optimized code. So I'm wondering if there is a way to change branch prediction for std::max_element(), like I did it above for my own function. Or more general, is there a way to get std algorithms more optimized without re-writing them?

GCC 9.2.0
SUSE Linux Enterprise Server 11 (x86_64)
g++ -DNDEBUG -O3 -Wall -fmessage-length=0 --std=c++17
Test programs: https://godbolt.org/z/HrABJt

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/202404/discussion-on-question-by-rom098-is-there-a-way-to-optimize-std-algorithms). — Samuel Liew, Nov 15 '19 at 12:55

Is there a way to optimize std algorithms?

0 Answers0

Linked