I am in the get the tools and just try it camp.
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a+b+1);
}
As mentioned the problem is the optimizer, if you dont optimize you get something like this:
arm-none-eabi-gcc -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o
00000000 <fun>:
0: e52db004 push {r11} ; (str r11, [sp, #-4]!)
4: e28db000 add r11, sp, #0
8: e24dd00c sub sp, sp, #12
c: e50b0008 str r0, [r11, #-8]
10: e50b100c str r1, [r11, #-12]
14: e51b2008 ldr r2, [r11, #-8]
18: e51b300c ldr r3, [r11, #-12]
1c: e0823003 add r3, r2, r3
20: e2833001 add r3, r3, #1
24: e1a00003 mov r0, r3
28: e24bd000 sub sp, r11, #0
2c: e49db004 pop {r11} ; (ldr r11, [sp], #4)
30: e12fff1e bx lr
It is actually quite readable, but takes more work than optimized:
arm-none-eabi-gcc -O2 -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o
00000000 <fun>:
0: e2811001 add r1, r1, #1
4: e0810000 add r0, r1, r0
8: e12fff1e bx lr
You will start to get a feel for how to write simple code that doesnt get optimized out, which is another good lesson IMO. The unoptimized, at least with GCC is going to have a strong desire to setup a stack frame then it is going to take the passed in operands and save those to the stack. Any local or intermediate variables it thinks it needs also get stack space. Every line of C code is handled separately in order, from the stack, so the operands are taken from the stack and results are saved back, even if they come back off right into the same variable. Thus the term optimized, could easy remove a lot of that code by keeping things in registers.
You can compile without the stack frame, an exercise for the reader, a very simple google search.
You can also begin to see the calling convention in action if you can overcome the optimizer (or can tolerate all the stack stuff if not optimized).
unsigned int more_fun ( unsigned int, unsigned int );
unsigned int fun ( unsigned int a, unsigned int b )
{
return(more_fun(a+1,b+2)+3);
}
0: e92d4010 push {r4, lr}
4: e2811002 add r1, r1, #2
8: e2800001 add r0, r0, #1
c: ebfffffe bl 0 <more_fun>
10: e8bd4010 pop {r4, lr}
14: e2800003 add r0, r0, #3
18: e12fff1e bx lr
the immediate 1 is added to r0 so that must be our first parameter. 2 goes with r1, the second parameter. Three is added to r0 coming back from the called function, so r0 must be where the return goes for simple functions like this (64 bit return values and structures and stuff are an exercise for the reader, you can also read the recommended calling conventions from arm, but 1) compilers can do whatever they want 2) sometimes it is much easier just to compile a function and disassemble).
The other mystery here is why is r4 pushed? Or maybe your compiler pushes r3 or some other register along with lr, or maybe it doesnt. Another SO question asked many times over without looking for the answer (cause it is hard to search for). ARM recommends keeping the stack 64 bit aligned, r4 in this case is just any arbitrary register, could have been most any of them just needed to push two and pop two. Why then on the prior one they didnt push another register with r11? Well apparently either gnu didnt see the need to worry about stack alignment during interrupts and two the stack adjustment while building the stack frame compensates to make it 64 bit aligned, you just have that interrupt exposure for a few instructions. I dont know what ARM recommends with respect to that.
You can take all the existing code you want of any project of any size and try to read the disassembly, depending on how built you may end up with a lot of stack stuff like the unoptimized code. You may end up with lots of nice optimizations related to loading immediates into registers that cant be done in one instruction but dont want to burn a pc relative load for. And maybe this is exactly what you are after, what real world code looks like and can you read it. You will want both the unoptimized end of it with a LOT more code but is a somewhat linear translation from C to machine code. And optimized code with re-ordering of operations, dead code elimination, tricks related to immediates and other math, as well as tail optimizations and other things. Then if you venture into mixed ARM/thumb more fun happens. You add floating point more fun happens.
No reason to expect any two different branded compilers (gnu, llvm, etc) to produce the same output from the same input, nor is there any reason to expect any two versions of the same brand (for lack of a better term) of compiler to produce the same results from the same source with the same command line options. So that again multiplies the fun.
Bottom line the tools have been there all along, is just a matter of using them.