Reading from the official Intel C++ Intrinsic Reference,
SSE 2 has the following command
__m128i _mm_madd_epi16(__m128i a, __m128i b)
Multiplies the 8 signed 16-bit integers from a by the 8 signed 16-bit integers from b. Adds the signed 32-bit integer results pairwise and packs the 4 signed 32-bit integer results.
while SSE 3 has
__m128i _mm_maddubs_epi16 (__m128i a, __m128i b)
Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed words.
Since Im working with 8bit pixels and I must only use SSE 2(old architecture is the target) I need an 8bit madd instruction. How would I proceed with that?