I wrote a very simple memset in c that works fine up to -O2 but not with -O3...
memset:
void * memset(void * blk, int c, size_t n)
{
unsigned char * dst = blk;
while (n-- > 0)
*dst++ = (unsigned char)c;
return blk;
}
...which compiles to this assembly when using -O2:
20000430 <memset>:
20000430: e3520000 cmp r2, #0 @ compare param 'n' with zero
20000434: 012fff1e bxeq lr @ if equal return to caller
20000438: e6ef1071 uxtb r1, r1 @ else zero extend (extract byte from) param 'c'
2000043c: e0802002 add r2, r0, r2 @ add pointer 'blk' to 'n'
20000440: e1a03000 mov r3, r0 @ move pointer 'blk' to r3
20000444: e4c31001 strb r1, [r3], #1 @ store value of 'c' to address of r3, increment r3 for next pass
20000448: e1530002 cmp r3, r2 @ compare current store address to calculated max address
2000044c: 1afffffc bne 20000444 <memset+0x14> @ if not equal store next byte
20000450: e12fff1e bx lr @ else back to caller
This makes sense to me. I annotated what happens here.
When I compile it with -O3 the program crashes. My memset calls itself repeatedly until it ate the whole stack:
200005e4 <memset>:
200005e4: e3520000 cmp r2, #0 @ compare param 'n' with zero
200005e8: e92d4010 push {r4, lr} @ ? (1)
200005ec: e1a04000 mov r4, r0 @ move pointer 'blk' to r4 (temp to hold return value)
200005f0: 0a000001 beq 200005fc <memset+0x18> @ if equal (first line compare) jump to epilogue
200005f4: e6ef1071 uxtb r1, r1 @ zero extend (extract byte from) param 'c'
200005f8: ebfffff9 bl 200005e4 <memset> @ call myself ? (2)
200005fc: e1a00004 mov r0, r4 @ epilogue start. move return value to r0
20000600: e8bd8010 pop {r4, pc} @ restore r4 and back to caller
I can't figure out how this optimised version is supposed to work without any strb
or similar. It doesn't matter if I try to set the memory to '0' or something else so the function is not only called on .bss (zero initialised) variables.
(1) This is a problem. This push gets endlessly repeated without a matching pop as it's called by (2) when the function doesn't early-exit because of 'n' being zero. I verified this with uart prints. Also r2 is never touched so why should the compare to zero ever become true?
Please help me understand what's happening here. Is the compiler assuming prerequisites that I may not fulfill?
Background: I'm using external code that requires memset in my baremetal project so I rolled my own. It's only used once on startup and not performance critical.
/edit: The compiler is called with these options:
arm-none-eabi-gcc -O3 -Wall -Wextra -fPIC -nostdlib -nostartfiles -marm -fstrict-volatile-bitfields -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon-vfpv3