1

I've implemented a vectorized version of the Black-Scholes formula using 256-bit SIMD and have written an unscientific benchmark that is telling me I'm getting about 20x performance boost, which is pretty good.

One of my lines of code initializes a 256-bit vector with four double values of 0.5:

const static __m256d half = {0.5,0.5,0.5,0.5};

Now here's the thing: if I replace the above with

const static __m256d half = _mm256_set1_pd(0.5);

Then I only get 1/4 of the performance boost! I can get the performance back if I remove the static part, but why?

Link to full example in Compiler Explorer: https://godbolt.org/z/vBq_fz

I am compiling the example in MSVC, 64-bit, VS 2019, Release mode. This strange difference does not show up in Debug mode.

Dmitri Nesteruk
  • 23,067
  • 22
  • 97
  • 166

0 Answers0