2

I have a simple C program:

int main(){    
    unsigned int counter = 0;
    ++counter;
    ++counter;
    ++counter;
    return 0;
}

I am using the following compile flags:

arm-none-eabi-gcc -c -mcpu=cortex-m4 -march=armv7e-m -mthumb 
-mfloat-abi=hard -mfpu=fpv4-sp-d16 -DPART_TM4C123GH6PM -O0 
-ffunction-sections -fdata-sections -g -gdwarf-3 -gstrict-dwarf 
-Wall -MD -std=c99 -c -MMD -MP -MF"main.d" -MT"main.o" -o"main.o"  "../main.c"

(some -I directives removed for brevity)

Note that I'm deliberately using -O0 to disable optimisations because I'm interested in learning what the compiler does to optimise.

This compiles into the following assembly for ARM Cortex-M4:

6           unsigned int counter = 0;
00000396:   2300                movs       r3, #0
00000398:   607B                str        r3, [r7, #4]
7           ++counter;
0000039a:   687B                ldr        r3, [r7, #4]
0000039c:   3301                adds       r3, #1
0000039e:   607B                str        r3, [r7, #4]
8           ++counter;
000003a0:   687B                ldr        r3, [r7, #4]
000003a2:   3301                adds       r3, #1
000003a4:   607B                str        r3, [r7, #4]
9           ++counter;
000003a6:   687B                ldr        r3, [r7, #4]
000003a8:   3301                adds       r3, #1
000003aa:   607B                str        r3, [r7, #4]

Why are there so many ldr r3, [r7, #4] and str r3, [r7, #4] instructions generated? And why does r7 even need to be involved, can't we just use r3?

Notlikethat
  • 20,095
  • 3
  • 40
  • 77
donturner
  • 17,867
  • 8
  • 59
  • 81
  • 2
    You forgot to enable optimization? You also didn't say what compiler you are using. – Jester Jun 15 '16 at 21:55
  • r7 is apparently used as the base address for local variables, in this case r7+4 is the address of counter. As already commented, it appears that optimization is not enabled. An optimizing compiler may detect that counter is never used outside of main and completely ignore it, so that main() just becomes a return 0. – rcgldr Jun 15 '16 at 22:00
  • You are completely right, if I use `-O1` then all the instructions are removed and the function returns 0. – donturner Jun 15 '16 at 22:15
  • `-O0` spills everything to memory after every statement, so you can change anything in a debugger and it will have the "expected" effect. This is why `-O0` code is so noisy and horrible to read as a human. It's not just "not optimized", it also reflects more about gcc internals than what a simple literal translation to asm by a human would look like. – Peter Cordes Jun 16 '16 at 03:30
  • What you should actually do is write functions that take args and return a result, so they don't optimize away at `-O3`. Do this on http://gcc.godbolt.org/ to get nicely-formatted asm, and an automatic recompile after editing. (Unfortunately the newest ARM compiler Matt has installed is only g++ 4.8, but that's still new enough for C++11 std::atomic if you want to force memory accesses with optimization on.) – Peter Cordes Jun 16 '16 at 03:31

2 Answers2

6

Without optimisation (which this clearly is), all the compiler is obliged to do is emit instructions which result in the behaviour defined by the higher level language. It is free to naïvely treat every statement entirely in isolation, and that's exactly what it's doing here; from the compiler's viewpoint:

  • A variable declaration: Well then, I need somewhere to store it, and that I can do by creating a stack frame (not shown, but r7 is being used as a frame pointer here).
  • New statement: counter = 0; - OK, I remember that the storage for counter is in the local stack frame, so I just pick a scratch register, generate the value 0 and store it to in that location, job done.
  • New statement: ++counter; - Right then, I remember that the storage for counter is in the local stack frame, so I pick a scratch register, load that with the value of the variable, increment it, then update the value of the variable by storing the result back. The return value is unused, so forget about it. Job done.
  • New statement: ++counter; - Right then, I remember that the storage for counter is in the local stack frame, so I pick a scratch register, load that with the value of the variable, increment it, then update the value of the variable by storing the result back. The return value is unused, so forget about it. Job done. As I am a piece of software I cannot even comprehend the human concept of Déjà vu, much less experience it.
  • New statement: ++counter; - Right then...

And so on. Every statement, perfectly compiled into machine instructions that do precisely the right thing. Exactly what you asked me to do. If you wanted me to reason about the code at a higher level and work out if I can take advantage of the relationships between those statements, you should have said something...

Notlikethat
  • 20,095
  • 3
  • 40
  • 77
  • Or to put it another way, [_"Ooh, a piece of code..."_](https://youtu.be/__i8-aw20C4?t=14) – Notlikethat Jun 16 '16 at 19:34
  • In fact, optimization disabled (the default `-O0`) = debug mode for gcc/clang, so they intentionally compile each C statement to a separate block of asm. This allows GDB the `jump` command to jump to other source lines and have everything work as if you were jumping around in the C abstract machine. [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394). `-O0` isn't just *not trying* to optimize, it's *required* to anti-optimize kind of like everything was volatile. – Peter Cordes Apr 23 '20 at 06:36
0

If the counter variable is not declared volatile, and if you set optimization for size ( -Os parameter), gcc will optmize that code with movs rn,#3 str rn,[variable address]