In writing part of a deserializer for a data structure in C, I needed a way to read 16-bit and 32-bit integers. Given that there is a possibility that this code may be compiled for and used on an architecture that may not be little-endian, I decided to write helper functions that explicitly decode from little-endian byte order:
#include <stdint.h>
void read_16(uint8_t *data, uint16_t *value) {
*value = data[0] | (data[1] << 8);
}
void read_32(uint8_t *data, uint32_t *value) {
*value = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
}
I was curious how this might be compiled on an architecture that is natively little-endian. arm-none-eabi-gcc
with -mcpu=cortex-a9
and -Os
gives the following output:
00000000 <read_16>:
0: e5d02001 ldrb r2, [r0, #1]
4: e5d03000 ldrb r3, [r0]
8: e1833402 orr r3, r3, r2, lsl #8
c: e1c130b0 strh r3, [r1]
10: e12fff1e bx lr
00000014 <read_32>:
14: e5903000 ldr r3, [r0]
18: e5813000 str r3, [r1]
1c: e12fff1e bx lr
Question: Is there a reason why the optimizer would simplify to a load-then-store for 32-bit, but not for 16-bit, given that such an operation is valid, would be shorter and faster, and optimizations for size are enabled?
Specifically, I would expect the following assembly for read_16
:
ldrh r3, [r0]
strh r3, [r1]
bx lr