I've found this snippet while looking at <algorithm>
implementations:
/**
* @if maint
* This is an overload used by find() for the RAI case.
* @endif
*/
template<typename _RandomAccessIterator, typename _Tp>
_RandomAccessIterator
find(_RandomAccessIterator __first, _RandomAccessIterator __last,
const _Tp& __val, random_access_iterator_tag)
{
typename iterator_traits<_RandomAccessIterator>::difference_type
__trip_count = (__last - __first) >> 2;
for ( ; __trip_count > 0 ; --__trip_count)
{
if (*__first == __val)
return __first;
++__first;
if (*__first == __val)
return __first;
++__first;
if (*__first == __val)
return __first;
++__first;
if (*__first == __val)
return __first;
++__first;
}
switch (__last - __first)
{
case 3:
if (*__first == __val)
return __first;
++__first;
case 2:
if (*__first == __val)
return __first;
++__first;
case 1:
if (*__first == __val)
return __first;
++__first;
case 0:
default:
return __last;
}
}
For my understanding the only "trick" here is to perform distance / 4
iterations with four if(...) return; ++
and the same with the remaining distance % 4
items inside a switch
. So as expected, exactly the same number of comparisons and incrementation is performed, and theoretical complexity is the same. How is it a better optimization than the trivial input iterator implementation? Is it a micro-optimization to lower the number of loop iterations or is there something smarter I don't get?