Understanding C runtime environment (ARM) - where to start

Question

I'am embedded developer working with ARM Cortex-M devices mainly. Recently I've switched to Linux and decided to learn more about the build/assemble/link process, how to write makefiles etc. as I was using IDEs (IAR, Keil, Eclipse etc.) where lot of stuff is automated and the user has actually no clue what is happening in the background. The target is to understand more of this low level process, how to use the tools properly and not just rely on the IDE default settings.

After writing a makefile I was able to build my application. However, I decided to perform the link process manually by invoking the linker directly (not through compiler) and surprisingly troubles showed up! Undefined references to libc, _init function in __libc_init_array, _exit etc. were the issues. After whole day of investigations I was able to include all the object files (crt0.o, crti.o) and libraries (libnosys.a) manually. Apparently the compiler is doing this stuff automatically.

After overcoming these hassles I figured out, that I've absolutely no clue about these internals. Why do I need some ctr0.o, crti.o etc. Where those come from etc. Is it linked to the compiler/linker or C runtime libraries or the system?

I would like to learn more about these internals, however, I'm not sure, what em I looking for actually. Is it a library, system, compiler all together?

I understand, that the system (MCU) need's to initialize the variables in RAM, and other stuff, however I miss the complete picture. Can you direct me to a good book/manuals, readings about these things, please? What am I looking for actually?

Edit:

After discussions here with you I most probably figured out, what I need, so I would rephrase my question following way:

1) I've received an MCU (let's say STM32F4xx) and I should create a blinking LED example. All this should be done from scratch, own startup code, no usage external libraries etc.

2) In the second step, someone told me that all this has already been done by others (GCC toolchain/standard libraries, MCU vendor startup files etc.). So I just need to understand/link my work with what was done and compare the differences, why they do it that way etc.

All this was answered by @mbjoe + I found some interesting reading: https://www.amazon.com/Embedded-Shoestring-experience-designing-software/dp/0750676094

Thank all of you for your help and directing me the right way!

Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. — LPs, Apr 27 '17 at 06:53
If you want to see the actual link command, maybe there's a "verbose" mode when using the compiler. Gcc certainly can show you everything it is doing. — Jens, Apr 27 '17 at 06:57
Perhaps I can suggest you to go through advanced Linux programming book, I don't have specific page number in my head, in that clearly mentioned about building Linux for ARM from scratch, writing makefile till using tftp to boot kernel. Your question is very broad. — danglingpointer, Apr 27 '17 at 07:11
Understanding how things work involves understanding what code that needs to be executed a program start-up in any microcontroller C program. And in the case of ARM, what the CMSIS does and does not. Tools like IAR will have the start-up code pre-written for you. You'd have to roll out your own code instead. This is overall rather useful knowledge. — Lundin, Apr 27 '17 at 08:34
However, learning how to link files manually, creating make files etc, instead of having the IDE do it for you, is mostly useless knowledge, as these are meta tasks not related to the actual functionality of the program. Make files is a thing of the past, used for command-line crap. Compiling, linking, debugging etc through 500 obscure command line commands are just painful, completely outdated techniques. These are the things you want an IDE to do for you. — Lundin, Apr 27 '17 at 08:36
Could not agree with @Lundin more. If the tool will create the makefile etc .for you from nice, easy-to-understand GUI forms, why not use them? Surely, nobody actually wants to directly use command-line txt for defining all those sections and segments that make up an executable image, given an easier alternative? — ThingyWotsit, Apr 27 '17 at 08:54
Also, any decent environment should have source for the crt libs. — ThingyWotsit, Apr 27 '17 at 08:57
Although it may be painful, sometimes you want to have complete control of what your code is doing, especially in an embedded environment. — BillyJoe, Apr 27 '17 at 09:26
@mbjoe yes, and that is available from IDE environments, (which are often just front-ends on top of the command-line tools). IDE's are just easier and quicker, especially when debugging. — ThingyWotsit, Apr 27 '17 at 09:59
@Lundin That's what I'm looking for, actually. It's good to have a decent knowledge how linker works etc. but not necessary to understand every switch/flag etc. However it's important to understand what is happening under the hood. A lot of you are talking about IDE's, but IDE is not going to do stuff for you. Once you need to do certain thing, you can use IDE for it and it can help you, but you are the one, who needs to do it and you need to know what are you doing and where in the IDE can such thing be achieved. So at the end you need to understand the "low-level" thing. — ST Renegade, Apr 27 '17 at 10:38

score 3 · Accepted Answer · answered Apr 27 '17 at 08:27

The modules you are referring to (ctr0.o, crti.o, _init, __libc_init_array, _exit) are prebuilt libraries/object files/functions by IAR and/or Keil. As you are saying they are needed to get the environment initialized (global variables initialization, interrupt vector table, etc.) before running your main() function.

At some point in those libraries/object files there will be a function in C or assembly like this:

void startup(void)
{ 
    ... init code ...

    main();

    while(1);   // or _exit()
}

You may look into these examples that build the startup code from scratch:

http://www.embedded.com/design/mcus-processors-and-socs/4007119/Building-Bare-Metal-ARM-Systems-with-GNU-Part-1--Getting-Started

https://github.com/payne92/bare-metal-arm

Maybe this is something I'm looking for. Thank you very much mbjoe, I'll definitely have a look on these links! — ST Renegade, Apr 27 '17 at 10:31

old_timer · Answer 2 · 2021-02-25T22:00:39.187

I've received an MCU (let's say STM32F4xx) and I should create a blinking LED example. All this should be done from scratch, own startup code, no usage external libraries etc.

I have an MCU say an STM32F4xx and I want to blink the led on PA5 with no libraries, from scratch, nothing external.

blinker01.c

void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );

#define RCCBASE 0x40023800
#define RCC_AHB1ENR (RCCBASE+0x30)

#define GPIOABASE 0x40020000
#define GPIOA_MODER     (GPIOABASE+0x00)
#define GPIOA_OTYPER    (GPIOABASE+0x04)
#define GPIOA_BSRR      (GPIOABASE+0x18)

int notmain ( void )
{
    unsigned int ra;
    unsigned int rx;

    ra=GET32(RCC_AHB1ENR);
    ra|=1<<0; //enable GPIOA
    PUT32(RCC_AHB1ENR,ra);

    ra=GET32(GPIOA_MODER);
    ra&=~(3<<10); //PA5
    ra|=1<<10; //PA5
    PUT32(GPIOA_MODER,ra);
    //OTYPER
    ra=GET32(GPIOA_OTYPER);
    ra&=~(1<<5); //PA5
    PUT32(GPIOA_OTYPER,ra);

    for(rx=0;;rx++)
    {
        PUT32(GPIOA_BSRR,((1<<5)<<0));
        for(ra=0;ra<200000;ra++) dummy(ra);
        PUT32(GPIOA_BSRR,((1<<5)<<16));
        for(ra=0;ra<200000;ra++) dummy(ra);
    }
    return(0);
}

flash.s

.thumb

.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang

.thumb_func
reset:
    bl notmain
    b hang
.thumb_func
hang:   b .

.align

.thumb_func
.globl PUT16
PUT16:
    strh r1,[r0]
    bx lr

.thumb_func
.globl PUT32
PUT32:
    str r1,[r0]
    bx lr

.thumb_func
.globl GET32
GET32:
    ldr r0,[r0]
    bx lr

.thumb_func
.globl dummy
dummy:
    bx lr

linker script flash.ld

MEMORY
{
    rom : ORIGIN = 0x08000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > rom
    .rodata : { *(.rodata*) } > rom
    .bss : { *(.bss*) } > ram
}

this is all using gcc/gnu tools

arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m4 flash.s -o flash.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding  -mcpu=cortex-m4 -mthumb -mcpu=cortex-m4 -c blinker01.c -o blinker01.flash.o
arm-none-eabi-ld -o blinker01.flash.elf -T flash.ld flash.o blinker01.flash.o
arm-none-eabi-objdump -D blinker01.flash.elf > blinker01.flash.list
arm-none-eabi-objcopy blinker01.flash.elf blinker01.flash.bin -O binary

to make sure it will boot right and it linked right check the vector table from the list file

08000000 <_start>:
 8000000:   20001000 
 8000004:   08000041 
 8000008:   08000047 
 800000c:   08000047 
 8000010:   08000047 
 8000014:   08000047

these should be odd numbers, the handler address orred with one

08000040 <reset>:
 8000040:   f000 f80a   bl  8000058 <notmain>
 8000044:   e7ff        b.n 8000046 <hang>

08000046 <hang>:
 8000046:   e7fe        b.n 8000046 <hang>

and start at 0x08000000 in the case of these STM32 parts (some vendors you build for zero)(on powerup zero is mirrored from 0x08000000 so the vector will take you to the proper place in flash).

As far as the led goes make the gpio pin a push-pull output and turn it off and on. in this case burn some cpu cycles then change state. by using a function not in the blinker01.c it forces the compiler to perform those counts (rather than doing a volatile thing), simple optimization trick. PUT32/GET32 personal preference, insuring the correct instruction is used, compilers dont always use the correct instruction and if the hardware requires a certain sized operation you could get in trouble. Abstracting has more pros than cons, IMO.

Fairly simple to configure and use these parts. Good to learn it this way as well as using the libraries, professionally you may have to deal with both extremes, perhaps you get to be the one that writes the libraries for others and need to know both at the same time.

Knowing your tools is about the most important thing and yes most folks dont know how to do that in this business, they rely on a tool, work around the warts of the tool or library rather than understand what is going on and/or fix it. the point of this answer is 1) you asked and 2) to show just how easy it is to use the tools.

could have made it even simpler if I got rid of the functions in assembly and only used assembly as a very simple way to make the vector table. the cortex-m is such that you can do everything in C except the vector table (which you can but it is ugly) and then use something like the well tested and working assembler to create the vector table.

Note cortex-m0 vs the others

 8000074:   f420 6140   bic.w   r1, r0, #3072   ; 0xc00
 8000078:   f441 6180   orr.w   r1, r1, #1024   ; 0x400

the cortex-m0 and (m1 if you come across one) are armv6m based where the rest are armv7m which has like 150 more thumb2 extensions to the thumb instruction set (formerly undefined instructions used to make variable length instructions). all the cortex-ms run thumb, but the cortex-m0 does not support the armv7m specific extensions, you can modify the build to say cortex-m0 instead of m4 and it will work just fine on the m4, take code like this (patch up the addresses as needed perhaps the gpio is different for your specific part perhaps not) and build for m0 it will run on m0...Just like the need to periodically check to see the vector table is being built right, you can examine the dissassembly to see that the right flavor of instructions are being used.

From your post It's clear the user should be familiar with all the tools... I see I've quite a lot of holes in my knowledge. Is there a starting point you would recommend to start at? Unfortunately all the tools are quite complex and the manuals are big, so it's not easy just to delve into the topic and jumping from one to the other is not possible. :-) — ST Renegade, Apr 28 '17 at 05:32
work through other items like the systick timer in the core to make the led blink at an accurate rate. one of the timer peripherals, the uart, etc...you have a foundation, be it this code or other examples you can get from mbed or elsewhere (I hope/assume you have a nucleo board or a discovery and not a loose chip you put on a board)... — old_timer, Apr 28 '17 at 12:25
go to ST and get the documentation for the specific chip and board you have, they might also have a programmers manual for the processor, but if you have an stm32f4 then it is a cortex-m4 and go to arms website infocenter.arm.com and get the architectural reference manual and the technical reference manual for the cortex-m4 and armv7-m which is the architecture. the systick timer is in there as well as the instruction set, why the vector table I setup is that way, etc. the st documentation shows the gpio registers and how to configure them, etc. — old_timer, Apr 28 '17 at 12:29
@old_timer, your answer is gold! I learned a lot from your example. Thanks! — jap jap, Feb 25 '21 at 17:56

score 2 · Answer 3 · edited May 23 '17 at 12:17

That's quite a big question, but I'll try to answer it and give you an overview of all steps that are required to turn a "hello world" into an actual arm executable. I'll focus on the commands to show every step rather than explaining every single detail.

#include <stdio.h>

int main()
{
        printf("Hello world!\r\n");
        return 0;
}

I will use gcc on ubuntu 17.04 for this example. arm-none-eabi-gcc (15:5.4.1+svn241155-1) 5.4.1 20160919

1. Preprocessing

It basically takes care of every line starting with a #. To show the output of the preprocessor use arm-none-eabi-gcc -E or arm-none-eabi-cpp.

arm-none-eabi-gcc -E main.c

The output is very long because of all the things that happen when you #include <stdio.h> and it still contains "unreadable" lines like # 585 "/usr/include/newlib/stdio.h" 3

If you use the arguments -E -P -C the output becomes a lot clearer.

arm-none-eabi-gcc -E -P -C main.c -o main-preprocessed.c

Now you can see that #include just copied all the contents from stdio.h to your code.

2. Compiling

This step translates the preprocessed file to assembly instructions, which are still human readable. To get machine code use -S.

arm-none-eabi-gcc -S main.c

You should end up with a file called main.s that contains your assembly instructions.

3. Assembling

Now it starts to get a lot less human readable. Pass -c to gcc to see the output. This step is also the reason why inline assembly is possible.

arm-none-eabi-gcc -c main.c

You should end up with a main.o file which can be displayed with hexdump or xxd. I would recommend xxd because it shows you the ascii representation next to the raw hexadecimal numbers.

xxd main.o

4. Linking

The final stage, after that your program is ready to be executed by the target system. The linker adds the "missing" code. For example there was no sign of the printf() function or anything from stdio.h.

arm-none-eabi-gcc main.c --specs=nosys.specs -o main

For the --specs=nosys.specs see here: https://stackoverflow.com/a/23922211/2394967

This is just a rough overview, but you should be able to find a lot more information on every step here on stackoverflow. (example for the linker: What do linkers do? )

Hello Skynet, thank you very much for your effort writing the whole process step-by-step. In general I'm familiar with these steps. However what I don't understand is the very low level when linking happens and what is linked and why it is linked and how all these things get structured etc. This was my case and I figured out, that I should learn more about it to understand it a bit more. However thank you for the link related to linker, I'll definitely have a look on it! The point was, should I learn more about linker, compiler, the standard libraries or what em I actually looking for? :-) — ST Renegade, Apr 27 '17 at 10:27
The crt0.o, crti.o etc. are part of libc (the c standard library). Details here: https://dev.gentoo.org/~vapier/crt.txt . Other essential pieces of any arm code are probably the CMSIS and the startup code (usually in startup.s). For me the most important knowledge when programming arm devices is the corresponding reference manual and errata sheet. Everything else is the job of the IDE until the IDE does something wrong. — sknt, Apr 27 '17 at 10:47
This is what I figured out, although I was not able to find references to all the functions ( like _exit(), apparently not part of libc ). However the question is, what needs to be included in such startup file, or how such startup file should look like, why I need to link these libraries etc. I would like to understand a bit the design to understand the steps carried our during linking etc. What are the differences between -spec=nosys.spec or -spec=nano.spec etc. of course they link different libraries, but what is the difference, what they offer, when to use them etc. — ST Renegade, Apr 27 '17 at 10:54
GCC and it's libraries were designed to run on Unix-like OSes, since you don't run any OS on your MCU you don't need the full blown library, that's why there is a smaller version - the "nano" version. To why you need to link libc (or more precie an implementation of the c standard) - because it gets ugly otherwise: http://stackoverflow.com/questions/2548486/compiling-without-libc . Some differences between implementations can be viewed here: http://www.etalabs.net/compare_libcs.html (biased towards musl, but still useful) — sknt, Apr 27 '17 at 11:13

score 2 · Answer 4 · answered Apr 27 '17 at 13:19

I've found this two-part blogpost quite a good and interesting read, explaining exactly the details you are asking for:

https://blogs.oracle.com/ksplice/hello-from-a-libc-free-world-part-1
https://blogs.oracle.com/ksplice/hello-from-a-libc-free-world-part-2

Some of the central points of this:

main() is not the entry point to your program.

The kernel/loader does not "call" any function at all, rather it sets up the virtual address space, places some data on the stack, and then starts executing the process at an address indicated by the executable file.
Your program cannot return as a function does.

This is a direct consequence of the point above: There is simply no return address on the stack that the program could return to. Instead, the process has to make a syscall to ask the kernel to destroy the process. This syscall is the exit_group() syscall, to be precise.

This is done by creating a software interrupt, which causes a kernel mode interrupt handler to run. This interrupt handler will then manipulate the kernels data structures to destroy and dispose off the process and release the resources it was holding. While the effect is quite similar to that of a function call (which never returns), the CPU mechanisms used here are quite different.

Note that you do not need to link against any library to make a syscall, the syscall is simply some instructions to load the syscall arguments into CPU registers, followed by an interrupt instruction. The _exit() function that you saw missing in your linking attempts is not the syscall, it's just a wrapper around it. C does not know what a syscall is, the libc wrappers have to use language extensions to be able to make a syscall. That's why you generally link against the libc, and call its syscall wrappers instead of making them directly: It isolates you against the implementation defined details of making a syscall.
The libc code that runs before and after the invocation of main() generally takes care of loading dynamic libraries, initializing static data (if necessary), calling functions marked with __attribute__((constructor)) or __attribute__((destructor)), calling functions that were registered with atexit(), etc.

score 0 · Answer 5 · answered Apr 28 '17 at 12:27

It is not as easy , at least it should not be done such a simple way.

Because you use C language - it has some requirements for the startup code.

You need to take stack and heap into consideration and set it up rpperly.
At least a minimalistic vector table
C data can be initialized (eg you can declare int a = 5;) so the startup code should provide the copy routine (from flash do that ram segment)
C presumes that the uninitialized global data is zeroed - startup routine should have this function as well
Depending on your application some additional code may be required.

So you will have to create at least (to be able to use C without special limitations) - a startup routine and the linker script in witch you declare memory sections, their sizes, boundaries, calculate start and end addresses for initialisation routines.

In my opinion it is pointless to do it form the scratch - you can always amend the supplied scripts and startup files. For ARM uC CMSIS is probably the very best choice as it gives you absolute freedom.

Understanding C runtime environment (ARM) - where to start

5 Answers5

1. Preprocessing

2. Compiling

3. Assembling

4. Linking

Linked