My SSE code is just as slow as the standard C one, what am I doing wrong ?
Im running on a Intel i3-6100 CPU, using C with minGW and CLion, im using the -O0 flag.
Im mesuring the performance using the clock() function, both versions are equally fast to about 45 ticks (of over 1000) (SSE:1138 ticks - C:1093ticks). I thaught that SSE somehow messes up the clock() time mesuring, but even by simply counting seconds ther is no differnce.
the function : (swaping comments..)
void vTrace(struct Ray * ray, float t, struct Vec3f * r){
//__m128 * mr = (__m128 *)r;
//__m128 mt_m = _mm_set1_ps(t);
//*mr = _mm_add_ps(*(__m128*)&ray->o, _mm_mul_ps(*(__m128*)&ray->d, mt_m));
r->x = ray->o.x + ray->d.x*t;
r->y = ray->o.y + ray->d.y*t;
r->z = ray->o.z + ray->d.z*t;
}
the benchmarking code:
float benchmark_t = 1;
struct Ray benchmark_ray;
vInit3f(&benchmark_ray.o, 0.2, 0.23, 1.4);
vInit3f(&benchmark_ray.d, 0.2, 0.23, 1.4);
ticks = clock();
i = 0;
while(i < 1000000000 ){
vTrace(&benchmark_ray, benchmark_t, &benchmark_ray.o);
i ++;
}
printf("TIME : %i ticks\n", (clock()-ticks));
printVec("result", benchmark_ray.o);
the structures :
struct Vec3f{
float x;
float y;
float z;
float w;//just for SSE
};
struct Ray{
struct Vec3f o;
struct Vec3f d;
struct Vec3f inverse_d;
};
Using SSE the performance should be about 4-times that fast why is there no performance gain ?