I had a look on the sse and mmx instruction set and there are no instructions for 3 channel image processing. Of course, for many operations you can use the same instructions, such as averaging two images. But when it comes to operations like unshuffling the channels or mixing different channels by a linear transformation, it seems a lot easier to use 32 bit images.
How are the performance chararteristics of typical image processing tasks with 24 vs. 32 bit images?