here's a snippet of code I have:
for (int oscIndex = 0; oscIndex < kNumOscs; oscIndex++) {
for (int voiceIndex = 0; voiceIndex < numVoices; voiceIndex += 4) {
const int v = voiceIndex / 4;
// vol
osc[oscIndex][v] = _mm_mul_ps(osc[oscIndex][v], vol[oscIndex][v]);
// prev output
mPrevOutput[oscIndex][v] = osc[oscIndex][v];
// out
osc[oscIndex][v] = _mm_mul_ps(osc[oscIndex][v], out[oscIndex][v]);
}
}
is it correct to copy values on mPrevOutput
in this way?
or a (unique) memcpy
will result faster?
mPrevOutput
and osc
have the same length (in this case, kNumOscs=4 x numVoices=16 x m128).
I'm on a windows/64 bit machine, using FLAGS += -O3 -march=nocona -funsafe-math-optimizations
That's how they are defined:
alignas(16) std::array<std::array<m128, 4>, kNumOscs> mPrevOutput; // member of a class
m128 osc[4][4]; // declared every time the function's class is executed