First off, assembly language is specific to the assembler, the tool that reads it. Not the target (arm, x86, mips, etc).
Function names are basically labels which means addresses. There is no real notion of functions, variable type (unsigned int, float, boolean, etc), address vs data vs instructions outside high level languages. Assembly generally has no real notion of these concepts, because they don't exist at that level. When computing an offset in an struct in order to access some item the base address and offset are just numbers when the add happens they are not addresses nor offsets, and they are only an address for the brief moment that that instruction executes, the one clock cycle when the address is latched and sent through logic toward the bus, otherwise its just bits.
Now saying that some assembly languages has declarations that use words like FUNCTION or PROCEDURE but these are not necessarily like high level languages where you have clearly divided boundaries.
And then there is compiler generated code vs hand generated code and with hand generated there is no expectation of these notions of boundaries.
unsigned int fun0 ( void )
{
return(0x12345678);
}
void fun1 ( unsigned int y )
{
static unsigned int x=5;
x=x+y;
}
For a particular compiler/command line produces this (disassembly of compiled and asssembled output)
Disassembly of section .text:
00000000 <fun0>:
0: 4800 ldr r0, [pc, #0] ; (4 <fun0+0x4>)
2: 4770 bx lr
4: 12345678
00000008 <fun1>:
8: 4902 ldr r1, [pc, #8] ; (14 <fun1+0xc>)
a: 680a ldr r2, [r1, #0]
c: 1810 adds r0, r2, r0
e: 6008 str r0, [r1, #0]
10: 4770 bx lr
12: 46c0 nop ; (mov r8, r8)
14: 00000000
Disassembly of section .data:
00000000 <fun1.x>:
0: 00000005
The function names are simply labels which means they are just addresses, from the processors perspective there is no notion of label much less function.
So from this view what is your definition of the boundary of the function? Does it end at the return? If so then there are items for the function outside the return of the function. The local global (static local) clearly is in the .data section which is well outside the function.
.globl fun0
.p2align 2
.type fun0,%function
.code 16
.thumb_func
fun0:
.fnstart
ldr r0, .LCPI0_0
bx lr
.p2align 2
.LCPI0_0:
.long 305419896
.Lfunc_end0:
.size fun0, .Lfunc_end0-fun0
.cantunwind
.fnend
If you look at clangs output, which is aimed basically at gnu assembler, so this us gnu assembler assembly language you see the notion of a function, likely for debugger purposes, none of it means anything to the processor, there is no notion there, nor really to the assembler.
.type fun0,%function
Because this is arm, this serves perhaps as a function definition for high level concepts but also for arm/thumb-interwork it is crucial for the linker to generate the right addresses to things, it basically tells the assembler to tell the linker this label is a function label which means in this context a thumb function label address is the address ORRed with 1, and an arm function label address ORRed with zero or unmodified.
They double dipped here because
.thumb_func
fun0:
also takes care the ORRed with one thing. The type , function likely adds debugger info where users want to see the illusion of debugging a function when they think they are using a debugger on high level code.
If you remove
.fnstart
.fnend
nothing bad happens
and for thumb you can remove the .type function too, nobody notices other than perhaps folks using tools related to the high level language (debuggers, etc) the code generated is fine and works fine. (arm mode does not have a .arm_func equivalent you have to use .type , function to get the linker to work right)
Outside arm and maybe mips (has a 32/16 bit instruction set mix as well) I don't know if you would need to even care about those kinds of directives when producing working code.
Here again assembly is specific to the assembler, a compiler that generates assembly (gnu and others, it is the sane model to use a toolchain), needs to generate for a specific assembler obviously, and is bound by the features there. users have developed expectations like the illusion of single stepping through high level language and other debugging at the high level language, rather than reality and the tools have evolved to provide more debug information within the toolchain (compile, assemble, link) so that the final binary depending on build options can have that debug info (and that where needed the code can be unoptimized so that the debug view works).
Other questions, in line assembly is a compiler specific feature not necessarily part of a high level language standard. And it is not real assembly, or let's say it is a new assembly language as the compiler is the tool so it can/does vary from the assembler's assembly language in the toolchain. But many compilers depending on the language, support some form of inline assembly (no reason to expect it to be compatible across compilers), so in that context you can put instructions into your C code. Its an act of desperation, but technically possible.
LLVMs ir or bytecode is its own instruction set and language completely separate from a target or high level, its a whole other beast. Sane compiler designs have some form of internal structures/code to keep track of the compiled code on its way toward the target output (often assembly language or machine code), it's a whole other beast.
My understanding of llvm is that you use the compiler (clang) as your "assembler" which is disturbing, but that is how they did it. In that view I don't see it as inline assembly but real assembly. By default the linker isn't built from my experience so gnus linker is used. And at least with bare metal work the objects are compatible between llvm and gnu binutils, the assembly output from clang or llc is compatible with binutils assembly language (gnu assembler), etc. And the gnu disassembler is superior to llvms for debugging using the compiler/assembler output. llvm has made strides to do things internally and not need binutils, and if you build in the linker then you don't need binutils (for projects where you are doing the steps separately and not just clang hello.c -o hello).