I just understood how to get a dot-product of 2 arrays (as in the following code):
int A[8] = {1,2,3,4,5,1,2,3};
int B[8] = {2,3,4,5,6,2,3,4};
float result = 0;
for (int i = 0; i < 8; i ++) {
result += A[i] * B[i];
}
is equivalent to (in SIMD):
int A[8] = {1,2,3,4,5,1,2,3};
int B[8] = {2,3,4,5,6,2,3,4};
float result = 0;
__m128 r1 = {0,0,0,0};
__m128 r2 = {0,0,0,0};
__m128 r3 = {0,0,0,0};
for (int i = 0; i < 8; i += 4) {
float C[4] = {A[i], A[i+1], A[i+2], A[i+3]};
float D[4] = {B[i], B[i+1], B[i+2], B[i+3]};
__m128 a = _mm_loadu_ps(C);
__m128 b = _mm_loadu_ps(D);
r1 = _mm_mul_ps(a,b);
r2 = _mm_hadd_ps(r1, r1);
r3 = _mm_add_ss(_mm_hadd_ps(r2, r2), r3);
_mm_store_ss(&result, r3);
}
I am curious now how to get the equivalent code in SIMD if I want to multiply elements that aren't consecutive in the array. For example, if I wanted to perform the following, what would be the equivalent in SIMD?
int A[8] = {1,2,3,4,5,1,2,3};
int B[8] = {2,3,4,5,6,2,3,4};
float result = 0;
for (int i = 0; i < 8; i++) {
for (int j = 0; j < 8; j++) {
result += A[foo(i)] * B[foo(j)]
}
}
foo is just some function that returns an int as some function of the input argument.