3

In an interview, I was asked to implement big_to_little_endian() as a macro. I implemented using shift operator. But the interviewer want me to optimize this further. I could not do it. Later I googled & searched but could not find it. Can someone help in understanding how to further optimize this code?

#define be_to_le (((x) >> 24) | (((x) & 0x00FF0000) >> 8) | (((x) & 0x0000FF00) << 8) | ((x) << 24))
legends2k
  • 31,634
  • 25
  • 118
  • 222
0x07FC
  • 523
  • 1
  • 6
  • 33
  • 1
    Looks good to me. Perhaps he was looking for some ASM 'shuffle' instruction to be used? Should a good compiler not do this already? – leppie Feb 05 '15 at 06:05
  • 4
    Aside: Instead of reinventing the wheel, one should use [`ntohl`](http://linux.die.net/man/3/ntohl) or the equivalent provided by your platform. You might have added this point politely once you wrote that answer :) – legends2k Feb 05 '15 at 06:53
  • 3
    [This answer](http://stackoverflow.com/a/105339/183120) shows the fastest way by using compiler intrinsics. – legends2k Feb 05 '15 at 07:06

1 Answers1

3

He might have been referring to using a 16-bit op to swap the top two words then using 8-bit ops to swap the bytes in them -- saves a couple instructions, easiest done in a union, though C technically doesn't like it (but many compilers will accept it), and it still compiler dependent since you are hoping the compiler optimizes a couple things out:

union dword {
  unsigned int i;
  union shorts {
    unsigned short s0, s1;
    union bytes {
      unsigned char c0, c1, c2, c3;
    } c;
  } s;
};

union dword in = (union dword)x;
union dword temp = { x.s.s1, x.s.s0 };
union dword out = { temp.s.c.c1, temp.s.c.c0, temp.s.c.c3, temp.s.c.c2 };

Not even valid C, but you get the idea (and I don't think the compiler will even emit what I'm hoping it will).

Or you can save an op, but introduce a data dependency so probably runs slower.

temp = (x << 16) | ( x >> 16)
out = ((0xff00ff00 & temp) >> 8) | (0x00ff00ff & temp) << 8)

Best is just use the compiler intrinsic since it maps to a single bswap instruction.

JasonN
  • 1,339
  • 1
  • 15
  • 27