You have already accepted an answer, but I will still give mine, which might suit you better (or the same...). This is what I tested with:
int a[3] = {22445, 13, 1208132};
for (int i = 0; i < 3; i++)
{
unsigned char * c = (unsigned char *)&a[i];
cout << (unsigned int)c[0] << endl;
cout << (unsigned int)c[1] << endl;
cout << (unsigned int)c[2] << endl;
cout << (unsigned int)c[3] << endl;
cout << "---" << endl;
}
...and it works for me. Now I know you requested a char array, but this is equivalent. You also requested that c[0] == 0, c[1] == 0, c[2] == 87, c[3] == 173 for the first case, here the order is reversed.
Basically, you use the SAME value, you only access it differently.
Why haven't I used htonl(), you might ask?
Well since performance is an issue, I think you're better off not using it because it seems like a waste of (precious?) cycles to call a function which ensures that bytes will be in some order, when they could have been in that order already on some systems, and when you could have modified your code to use a different order if that was not the case.
So instead, you could have checked the order before, and then used different loops (more code, but improved performance) based on what the result of the test was.
Also, if you don't know if your system uses a 2 or 4 byte int, you could check that before, and again use different loops based on the result.
Point is: you will have more code, but you will not waste cycles in a critical area, which is inside the loop.
If you still have performance issues, you could unroll the loop (duplicate code inside the loop, and reduce loop counts) as this will also save you a couple of cycles.
Note that using c[0], c[1] etc.. is equivalent to *(c), *(c+1) as far as C++ is concerned.