Integer ops will have no problem at all with uninitialized values, since the latency is never data-dependent. Floating point is different. Some FPUs slow down on denormals, NaNs, and infinities (in any one of the vector elements).
Intel Nehalem and earlier slow down a lot when doing math ops with denormal inputs/outputs, and on FP underflow/overflow. Sandybridge has a nice FPU with fast add/sub for any inputs (according to Agner Fog's instruction tables), but multiply can still slow down.
Add/sub/multiply are fine with zeros, but potentially a problem with uninitialized junk that might represent NaN or something.
Be careful with division that you aren't dividing by zero. That could even raise an FPU exception, depending on HW settings.
So yes, keeping the unused element zeroed is probably a good idea. Depending how you generate things in the first place, this may be pretty cheap to accomplish. (e.g. movd/pinsrd/pinsrd (or insertps) to put three 32bit elements into a vector, with the initial movd zeroing the high 96b.)
One workaround could be to store a 2nd copy of the blue channel in the 4th element. (or whatever is most convenient to shuffle there.) You could load vectors with movsldup
(SSE3) / movlps
. After movsldup
, your register would hold { b b r r }
. movlps
would re-load the lower 64bits, so you'd have { b b g r }
. (This is equivalent to movsd
, BTW.) Or if the shuffle port is less busy than the load ports, do one 16B load and then shufps. (movsldup
on Intel CPUs is a single uop that runs on a load port, even though it has the duplication built in.)
Another option would be to pack your pixels into 12 bytes, so a 16B load would get one component of the next pixel. Depending on what you're doing, overlapping stores that clobber one element of the next pixel might or might not be ok. Loading the next pixel before storing the current could work around that for some ops. It's quite easy to be cache or bandwidth-limited, so saving 1/4 space at the small cost of the occasional cache-line split load/store could be worth it.