I've been thinking about using the SSE instruction set to make my 3d software rasterizer faster, but I've never used them before and feel like I am going completely wrong.
I'd like to hear from the more experienced on whether it is an effort that is worth it, and if this code is written poorly:
typedef union _declspec(align(16)) {
struct {
float x;
float y;
float z;
float w;
};
__m128 m128;
} Vec4_t;
Vec4_t AddVec(Vec4_t* a, Vec4_t *b) {
__m128 value = _mm_add_ps(a->m128, b->m128);
return *(Vec4_t*)&value;
}
This is how I'm testing it:
Vec4_t a = { 2.0f, 4.0f, 10.0f, 123.1f };
Vec4_t b = { 6.0f, 12.0f, 16.0f, 64.0f };
Vec4_t c = AddVec(&a, &b);
printf("%f, %f, %f, %f\n", c.x, c.y, c.z, c.w);
which outputs:
8.000000, 16.000000, 26.000000, 187.100006
I honestly have no idea what I'm doing. I'm surprised the code I wrote even worked.