An assembler takes assembly language, processor instructions that are easier for humans to read and write, and turns those into machine code, or binary versions of those instructions.
assembly language vectors.s
.thumb
.globl _start
_start:
.word 0x20001000
.word reset
.word foo
.word foo
.word foo
.word foo
.word foo
.word foo
.thumb_func
reset:
bl fun
.thumb_func
foo:
b foo
.globl dummy
dummy:
bx lr
assemble then disassemble
arm-none-eabi-as vectors.s -o vectors.o
arm-none-eabi-objdump -D vectors.o > vectors.list
related portion of the disassembly
Disassembly of section .text:
00000000 <_start>:
0: 20001000
...
00000020 <reset>:
20: f7ff fffe bl 0 <fun>
00000024 <foo>:
24: e7fe b.n 24 <foo>
00000026 <dummy>:
26: 4770 bx lr
The .words are not instructions those are ways to put data in the binary/output. In this case I am generating a vector table. The disassembler is not showing everything yet, we will see the rest. The assembler has left placeholders which we will see shortly for the linker to fill in. So this is what an object looks like the assembly has been turned into machine code. assembly bx lr, machine code 0x4770
There are exceptions to the rule, generally for specific reasons, but it generally does not make sense to have a compiler compile to machine code directly. You have to have an assembler for a target, so that is already there, use it. It is far easier for a compiler writer to debug assembly code than to debug machine code. There are some exceptions, there is the "just because I want to" kind of like why did you climb the mountain instead of go around "because it was there". And then there is the just in time reason, and some others. JIT needs to get to machine code sooner and or with one tool/library/driver/etc...So you may see those skip the step, it is harder to develop. often you can test this theory by renaming your assembler (have to hit the right binary though, the one you run on the command line may be a front for the real one, actually in the case of gcc I think gcc the program we use is just a front for cc1 and perhaps another program or two and the assembler and linker, all spawned from gcc unless you tell it not to).
so we take our simple entry program
#define FIVE 5
unsigned int more_fun ( unsigned int );
void fun ( void )
{
more_fun(FIVE);
}
compile
arm-none-eabi-gcc -mthumb -save-temps -O2 -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o > fun.list
the first temp is the pre-processor taking the #defines and #includes and basically getting rid of them, producing the file that will be sent to the compiler
# 1 "fun.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "fun.c"
unsigned int more_fun ( unsigned int );
void fun ( void )
{
more_fun(5);
}
Then the compiler itself is called and that compiles to assembly language
.cpu arm7tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.code 16
.file "fun.c"
.text
.align 2
.global fun
.code 16
.thumb_func
.type fun, %function
fun:
push {r3, lr}
mov r0, #5
bl more_fun
@ sp needed
pop {r3}
pop {r0}
bx r0
.size fun, .-fun
.ident "GCC: (15:4.9.3+svn231177-1) 4.9.3 20150529 (prerelease)"
Then the assembler is called to turn that into an object, which we can see here in the disassembly of the object what was produced:
Disassembly of section .text:
00000000 <fun>:
0: b508 push {r3, lr}
2: 2005 movs r0, #5
4: f7ff fffe bl 0 <more_fun>
8: bc08 pop {r3}
a: bc01 pop {r0}
c: 4700 bx r0
e: 46c0 nop ; (mov r8, r8)
Now the bl 0 is not yet real, more_fun is an external label so the linker will have to come in and fix this as we will see shortly.
more_fun.c same story
source code
#define ONE 1
unsigned int more_fun ( unsigned int x )
{
return(x+ONE);
}
compiler input
# 1 "more_fun.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "more_fun.c"
unsigned int more_fun ( unsigned int x )
{
return(x+1);
}
compiler output (assembler input)
.cpu arm7tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.code 16
.file "more_fun.c"
.text
.align 2
.global more_fun
.code 16
.thumb_func
.type more_fun, %function
more_fun:
add r0, r0, #1
@ sp needed
bx lr
.size more_fun, .-more_fun
.ident "GCC: (15:4.9.3+svn231177-1) 4.9.3 20150529 (prerelease)"
disassembly of the object (assembler output)
Disassembly of section .text:
00000000 <more_fun>:
0: 3001 adds r0, #1
2: 4770 bx lr
Now we link all of these together (there is a reason why it is called a toolchain, compile, assemble, link a series of tools chained together, the outputs of one feed the input of the other)
arm-none-eabi-ld -Ttext=0x2000 vectors.o fun.o more_fun.o -o run.elf
arm-none-eabi-objdump -D run.elf > run.list
arm-none-eabi-objcopy -O srec run.elf run.srec
Disassembly of section .text:
00002000 <_start>:
2000: 20001000
2004: 00002021
2008: 00002025
200c: 00002025
2010: 00002025
2014: 00002025
2018: 00002025
201c: 00002025
00002020 <reset>:
2020: f000 f802 bl 2028 <fun>
00002024 <foo>:
2024: e7fe b.n 2024 <foo>
00002026 <dummy>:
2026: 4770 bx lr
00002028 <fun>:
2028: b508 push {r3, lr}
202a: 2005 movs r0, #5
202c: f000 f804 bl 2038 <more_fun>
2030: bc08 pop {r3}
2032: bc01 pop {r0}
2034: 4700 bx r0
2036: 46c0 nop ; (mov r8, r8)
00002038 <more_fun>:
2038: 3001 adds r0, #1
203a: 4770 bx lr
the linker has adjusted the external label, in this case by modifying the instruction for the correct offset.
4: f7ff fffe bl 0 <more_fun>
202c: f000 f804 bl 2038 <more_fun>
The elf file format is one type of "binary" file, it is binary in that you open it with a text editor you see some text but mostly garbage. There are other "binary" file formats like a motorola s-record, which in this case only includes the real stuff, the machine code and any data, where the elf has debug information like the strings "fun" "more_fun", etc that the disassembler happened to have used to make the output a little prettier. Motorola S-Record and Intel Hex are ascii file formats like this:
S00B000072756E2E73726563C4
S113200000100020212000002520000025200000D1
S113201025200000252000002520000025200000A8
S113202000F002F8FEE7704708B5052000F004F858
S10F203008BC01BC0047C04601307047EA
S9032000DC
Not used as much anymore but not completely useless, used to need this format to program a rom, personal preference of tools makers as to what file formats they support. How does the binary get burned into a flash in a microcontroller? Some tool takes those bits from the host/development machine and through some interface and some software moves it to the target, what binary file formats does that tool support? Up to whomever wrote the tool to choose one or more formats.
Back before compilers were affordable in various numbers of ways (both cost to buy and/or storage space to hold the program on your computer, plus intermediate data, etc) assemblers could be used to make a whole program. You see directives like .org 100h, with a "toolchain" the assembler might have that feature but as part of the chain the assembler tool needs to get from assembly language to an object format, most of the conversion to machine code and other data. Certainly possible that a compiler could do all the work and output a finished binary, when part of a toolchain the sane method is to ultimately get from the source code to assembly language. The compiler tools we are used to, gcc, msvc, clang, etc, unless told not too will spawn the assembler and the linker for us as well as the compiler making it seem like the compiler went from source to final binary in one magical step. The linker takes individual objects who some have unresolved external labels, and decides where in the memory image, where in memory, they will go, resolving externals as needed. How much the linker does is very much part of the system design for these tools, the design can be such that the linker does not modify individual instructions it only places addresses in agreed upon places. An example of this:
vectors.s
.globl _start
_start:
bl fun
b .
.global hello
hello: .word 0
fun.c
#define FIVE 5
extern unsigned int hello;
void fun ( void )
{
hello+=FIVE;
}
fun.o disassembly
Disassembly of section .text:
00000000 <fun>:
0: e59f200c ldr r2, [pc, #12] ; 14 <fun+0x14>
4: e5923000 ldr r3, [r2]
8: e2833005 add r3, r3, #5
c: e5823000 str r3, [r2]
10: e12fff1e bx lr
14: 00000000 andeq r0, r0, r0
so we can see that it is loading from offset/address 0x14 a number into r2
then that number is used as an address to get hello, then what was read
has 5 added to it then the address in r2 is used to save hello back to
memory. So what is at 0x14 is a placeholder left by the compiler so the linker can place the address to hello there, which we see once linked
Disassembly of section .text:
00002000 <_start>:
2000: eb000001 bl 200c <fun>
2004: eafffffe b 2004 <_start+0x4>
00002008 <hello>:
2008: 00000000 andeq r0, r0, r0
0000200c <fun>:
200c: e59f200c ldr r2, [pc, #12] ; 2020 <fun+0x14>
2010: e5923000 ldr r3, [r2]
2014: e2833005 add r3, r3, #5
2018: e5823000 str r3, [r2]
201c: e12fff1e bx lr
2020: 00002008 andeq r2, r0, r8
0x2020 now holds the address to hello, the compiler built the program such that this address could easily be filled in by the linker and the linker filled it in. It is possible certainly to do this with branch/jump addresses, and different toolchains or different targets from the same tools will produce different solutions, it usually has to do with the instruction set. You have one with a near call (relative) and a far call (absolute), do you compile externals with a far call so it always works? Or do you take your chances and build for a near call and take the risk the linker has to put a trampoline in?
Not that exact thing but I can make gcc do this for thumb/arm fairly easily.
.thumb
.globl _start
_start:
bl fun
b .
.global hello
hello: .word 0
#define FIVE 5
extern unsigned int hello;
void fun ( void )
{
hello+=FIVE;
}
disassembly of linked binary
00002000 <_start>:
2000: f000 f812 bl 2028 <__fun_from_thumb>
2004: e7fe b.n 2004 <_start+0x4>
00002006 <hello>:
2006: 00000000 andeq r0, r0, r0
...
0000200c <fun>:
200c: e59f200c ldr r2, [pc, #12] ; 2020 <fun+0x14>
2010: e5923000 ldr r3, [r2]
2014: e2833005 add r3, r3, #5
2018: e5823000 str r3, [r2]
201c: e12fff1e bx lr
2020: 00002006 andeq r2, r0, r6
2024: 00000000 andeq r0, r0, r0
00002028 <__fun_from_thumb>:
2028: 4778 bx pc
202a: 46c0 nop ; (mov r8, r8)
202c: eafffff6 b 200c <fun>
Because the way this specific instruction set works you cant get from thumb code to arm code using the bl (basically call) instruction, you have to use bx which is just a branch (jump) not a call, the linker placed a trampoline, some code used to bounce from one thing to another in for us.
Not all instruction sets are easy to disassemble and/or the toolchain doesnt include one, its not a required part of a toolchain. But you can and should repeat this using gnu and other tools for this or other targets, as you can see I dont have to have special hardware, I dont have to write but more than a dozen lines of code to see these tools at work.