C startup code is only written in assembly confusion

Question

I understand that the C startup code is for initializing the C runtime environment, initializes static variables, sets up the stack pointer etc. and finally branches to main().

They say that this can only be written in assembly language as it's platform-specific. However, can't this still be written in C and compiled for the specific platform?

Function calls of course would be not possible because we "more than likely" don't have the stack pointer set up at that stage. I still can't see other main reasons. Thanks in advance.

Strictly conforming C code cannot initialize the processor registers (such as the frame pointer), as the C standard does not provide any way to refer to them. — Eric Postpischil, May 30 '21 at 13:25
@EricPostpischil Apart from the stack pointer issue is there any other strict reason why we cannot use lines of C code to write the C startup routine? — Engineer999, May 30 '21 at 13:27
Apart from the stack you might have to do initialization with special instructions that C simply does not support (e.g. switch modes, set up interrupts, memory model and such) — Jester, May 30 '21 at 13:28
I think a C compiler could well maintain the startup assembly snippets for each platform and then place them on demand, but the snippets would only be used in one place and so a library (the standard library) is a more appropriate place for them (?). — Petr Skocik, May 30 '21 at 13:31

score 7 · Answer 1 · answered May 30 '21 at 13:39

7

Startup code can be written in C language only if:

Implementation provides all necessary intrinsic functions to set hardware features that cannot be set using standard C
Provides mechanism of placing fragments of code and data in the specific place and in specific order (gcc support for ld linker scripts for example).

If both conditions are met you can write the startup code in C language.

I use my own startup code written in C (instead of one provided by the chip vendors) for Cortex-M microcontrollers as ARM provides CMSIS header files with all needed inline assembly functions and gcc based toolchain gives me full memory layout control.

answered May 30 '21 at 13:39

0___________

60,014
4
34
74

For the last part of your answer: Are you talking about STM32? ST provides startup files in assembly. Can there be a specific reason for that? I know some vendors provide them in C. What benefits does migrating them to C provide? – Tagli May 30 '21 at 14:01
1

Simple. I have much more complicated startup than the standard one. Easier to write and maintain in C than in assembler – 0___________ May 30 '21 at 15:20
1

@Tagli The vendor supplied start up for STM32 is only partially in assembly (in fact very little). SystemInit() for example is C code. – Clifford May 31 '21 at 14:08
@Tagli ... The assembler is little more than a jump to SystemInit() then a jump to __main(), the bulk of the assembler file is the vector table, mostly with weak link defaults that can be overridden with C code. It is just more succinct to do that in assembler and does not rely on language extensions and compiler directives that differ between compilers. Although assemblers differ in syntax and directives also of course – Clifford May 31 '21 at 14:15
@Clifford Startup code also initializes .bss & .data sections. They seem to be a few lines of assembly code. I wonder if it's also possible to write them in C. There is also a jump to `__libc_init_array`, which I guess calls C++ static object *ctors*, but I have no idea how it's implemented. – Tagli May 31 '21 at 14:22
@Tagli yes I was trying to be succinct not comprehensive. The segment initialisation is essentially a memory copy and memory zeroing. They can if course be written in C. The point is it is incorrect to suggest that the entire C runtime start-up must be written in assembly code. In it is clearly possible to write some, most and in some cases all in C. The latter would be unusual, even in architectures where it is possible. – Clifford May 31 '21 at 14:31
1

@Tagli yes you can write it in C language. It is good to add initialization of data placed in other sections as well. `__libc_init_array` is used in C as well when you use function attribute `__attribute__((constructor))` will execute the function having this attribute before call to main. – 0___________ May 31 '21 at 14:40

score 7 · Answer 2 · answered May 30 '21 at 14:20

Most of the problem with writing early startup code in C is, in fact, the absence of a properly structured stack. It's worse than just not being able to make function calls. All of a C compiler's generated machine code assumes the existence of a stack, pointed to by the ABI-specified register, that can be used for scratch storage at any time. Changing this assumption would be so much work as to amount to a complete second "back end" for the compiler—way more work than continuing to write early startup code by hand in assembly.

Early bootstrap code, bringing up the machine from power-on, also has to do a bunch of special operations that can't usually be accessed from C, like configuring interrupts and virtual memory. And it may have to deal with the code not having been loaded at the address it was linked for, or the relocation table not having been processed, or other similar problems; these also break pervasive assumptions made by the C compiler (e.g. that it can inject a call to memcpy whenever it wants).

Despite all that, most of a user mode C library's startup code will, in fact, be written in C, for exactly the reason you are thinking. Nobody wants to write more code in assembly, over and over for each supported ISA, than absolutely necessary.

Thanks for your reply. The configuring interrupts and virtual memory part, why would that usually not be possible to implement it is C? I don't understand this. — Engineer999, May 30 '21 at 15:36
@Engineer999 It involves machine instructions that aren't accessible from C, such as "move to/from control register" and "enable interrupts". It may also involve optimization constraints that aren't possible to enforce in C, e.g. the x86 rule that a MOV CR0 that enables paging must be *immediately* followed by a FAR JMP. *Some* of these can be handled with assembly inserts but, depending on the larger context, it may or may not gain you much in the way of portability or maintainability. — zwol, May 30 '21 at 19:19

Clifford · Answer 3 · 2021-05-31T14:24:23.510

4

A minimal C runtime environment requires a stack, and a jump to a start address. Setting the stack pointer on most architectures requires assembly code. Once a stack is available it is possible to run code generated from C source.

ARM Cortex-M devices load the stack pointer and start address from the vector table on reset, so can in fact boot directly into code generated from C source.

On other architectures, the minimal assembly requires is to set a stack pointer, and jump to the start address. Thereafter it is possible to write other start-up tasks in C ( or C++ even). Such startup code is responsible for establishing the full C runtime, so must not assume static initialisation or library initialisation (no heap or filesystem for example), which are things that must be done by the startup code.

In that sense you can run code generated from C source, but the environment is not strictly conforming until main() has been called, so there are some constraints.

Even where assembly code is used, it need not be the whole start-up code that is in assembly.

edited May 31 '21 at 14:24

answered May 31 '21 at 14:06

Clifford

88,407
13
85
165

Thanks for the reply. In terms of the vector table you mentioned for Cortex-M devices, are the entries in the interrupt vector table programmable, or fixed? – Engineer999 May 31 '21 at 15:17
@Engineer999 , They are programmable, but as they are located on flash, it's not very practical to do so. However, it's possible to move them into RAM to make them easily modifiable. This needs some hardware support and I'm not sure if it's supported on all platforms. – Tagli May 31 '21 at 15:58
@Tagli So normally the ARM Cortex-M devices come with a fixed address for the stack location which is never "normally" modified? Aswell as the interrupt handler addresses.. – Engineer999 May 31 '21 at 16:02
@Engineer999 Cortex M3 & M4 have a register called `VTOR`, which allows you to place vector table almost anywhere in FLASH or RAM (with some alignment constrains). I'm not sure if the implementation of `VTOR` by vendors is mandatory or optional. For STM32F0 (Cortex M0), there is no `VTOR`, but they can remap 0x00 to 0x20'000'000 (start of RAM). By default, 0x00 is mapped onto 0x8'000'000 (start of flash). – Tagli May 31 '21 at 16:09
@Engineer999 That's a new question and architecture specific. Ask a new question. On ARM Cortex-M, there is a vector table address register that can be set to relocate the vector table. That could be part of the startup code or done in the application. If you locate it in RAM you can if course modify interrupt handlers at runtime. – Clifford May 31 '21 at 17:06
@Engineer999 w.r.t stack address, no; the vector table starts with an _initial_ stack pointer, and start address. The start-up code may change the SP, or it may be changed dynamically in a multithreaded application with an RTOS. Also if you have a bootloader, the vector table stack/start, will be that if the bootloader. The bootloader will relocate the vector table to that if the application, and set the stack pointer (using assembler) to that if the application. – Clifford May 31 '21 at 17:12

C startup code is only written in assembly confusion

3 Answers3