We live in wonderful times where you can answer this yourself, always, without asking anyone.
But first of all: Accept that you're doing micro-optimizations. It really matters what exactly appears in the tests, how the values are accessed in terms of cache locality, and what's done if a test succeeds. It is a valid result that such micro-optimizations make no difference. It simply tells you that you're focusing at the wrong area.
It also tells you that the billions of dollars worth of R&D done by the CPU manufacturer is paying off. Modern CPUs are quite amazing at doing everything possible to get performance out of the code. If you're not doing something ridiculously bad, like having std::list<type*>
, and not wasting memory for the items you test, you'll be in a good starting point already.
Look at the optimized assembly output on godbolt (or from your own run of the compiler!). Make sure that you did indeed enable optimization (-O3
for gcc/clang, /O4
for msvc). Otherwise you'd be wasting time. Pay special attention to compiler's attempts at vectorizing the loop.
Run benchmarks. But understand what makes for good benchmarking. See eg How can I benchmark the performance of C++ code?
Make your question more specific. As it stands, without you showing real code (properly minimized, though!), it can't be reasonably answered and should be closed.