Currently I have this function to swap the bytes of a data in order to change endianness.
template<typename Type, unsigned int Half = sizeof(Type)/2, unsigned int End = sizeof(Type)-1>
inline void swapBytes(Type& x)
{
char* c = reinterpret_cast<char*>(&x);
char tmp;
for (unsigned int i = 0; i < Half; ++i) {
tmp = c[i];
c[i] = c[End-i];
c[End-i] = tmp;
}
}
This function will be called by some algorithms of mine several million times. Consequently, every single instruction that can be avoided would be a good thing.
My question is : how can this function be optimized ?