I'm using a SIMD implementation that replicates the STL's std::transform. All the vectors I'm using are aligned.
When using 3 separate vectors for the transform, the performance of the SIMD transform (which uses _mm512_and_si512
) is identical to that of std::transform
. However, if instead I use one vector for the two input ranges, I get a 1.33x speedup using SIMD. The speedup is ~3x when I use the same vector for all transform
arguments.
std::transform(first1, last1, first2, d_first,
[](const auto& a, const auto& b) {return a & b;}); // Identical performance using SIMD
std::transform(first1, last1, first1 + 1, d_first,
[](const auto& a, const auto& b) {return a & b;}); // 1.33 speedup using SIMD
std::transform(first1, last1, first1 + 1, first1,
[](const auto& a, const auto& b) {return a & b;}); // 3 speedup using SIMD
What is the reason for there to be no performance difference w/ 3 vectors? Is it just coincidence that its performance is identical to non-SIMD?