0

This is a simple program in C.

char a;
void main(){};

And it caused this assembly to be generated startig with

.text
.globl  a
.bss
.type   a, @object
.size   a, 1

so I like to know how to interpret the above

so I see .text I belive this is just symbol . and text means start of code section And U see .global so I believe my variable(s) that start right after that will be global variables or functions, etc. or do I need to write section name, i.e. .text right before all variables and functions? this is the question

then u see .bss now after that . and bss all uninitialied variables and functions are declared

and then finally I see something akin to what my C program had a global variable named char a like

.type   a, @object

so .type tells what is it so I assume its of object type as mentioned with @ and object in .type a,@object

so now size which is 1 char. so this line

.size   a, 1

so I assume if I had global int a; then that would be

.size a,4

char is 1 byte int is 4 bytes

then moving on

I have

a:

so the first few lines becomes like following

assume this is code 1

# my comment 1
# my comment 2
    .text
    .globl  a
    .bss
    .type   a, @object
    .size   a, 1
a:

So the question is why a: is at the bottom

what if I do like this

this is code 2

a:
    .text
    .globl  a
    .bss
    .type   a, @object
    .size   a, 1

so I like to know is code 1 and code 2 same? to declare or define a: appearing first in one and at second in code 2

so from above my a is in .text and .global and .bss and .type is @object and size is 1 byte. This is lots of code to define just one char variable. So is it correct understanding??? should I doubt it

further moving on, now it turn of a global main which is in .text section plus .global

so I see

.zero   1
.text
.globl  main
.type   main, @function

main:

so I really dont want to care about .zero 1 line but if I am wrong not to care then tell me the use of it. so again have my gcc place main in .zero (some section???) and .text section plus .global code section and the type is @function so now I know type come after , as in .type main,@function and in .type a, @object

then I encounter complete BS, searching for .LFB0: brought zero google search results

is .LFB0: a some section of program that my x86-64 processor can run

and .cfi_startproc is eh_frame so I read .eh_frame is a section that lives in the loaded part of the program. so I like to know if I am coding in assembly can I ignore .cfi_startproc line. but What is the point of this. does this mean after this everything is loaded in memory or registers and and is .ehframe

main:
.LFB0:
    .cfi_startproc
    endbr64 
    pushq   %rbp    #
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp  #,
    .cfi_def_cfa_register 6

so if I am making a simple assembly program simlar to above C program in assembly do I need to code from .LFB0: to movq %rsp, %rbp #,\n.cfi_def_cfa_register 6 if not needed then I can assume my program will become

    .text
    .globl  a
    .bss
    .type   a, @object
    .size   a, 1
a:
    .zero   1
    .text
    .globl  main
    .type   main, @function
main:
             .cfi_startproc
    pushq   %rbp    
    movq    %rsp, %rbp  
    nop 
    popq    %rbp    

    ret 
             .cfi_endproc

so my full program becomes above, how to compile this with nasm can any one please tell I believe I have to save it with .s or .S extension which one s small or large S? I am coding in Ubuntu

This is gcc generated code

        .file   "test.c"
    # GNU C17 (Ubuntu 11.2.0-7ubuntu2) version 11.2.0 (x86_64-linux-gnu)
    #   compiled by GNU C version 11.2.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.0, isl version isl-0.24-GMP

    # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
    # options passed: -mtune=generic -march=x86-64 -fasynchronous-unwind-tables -fstack-protector-strong -fstack-clash-protection -fcf-protection
        .text
        .globl  a
        .bss
        .type   a, @object
        .size   a, 1
    a:
        .zero   1
        .text
        .globl  main
        .type   main, @function
    main:
    .LFB0:
        .cfi_startproc
        endbr64 
        pushq   %rbp    #
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp  #,
        .cfi_def_cfa_register 6
    # test.c:2: void main(){};
        nop 
        popq    %rbp    #
        .cfi_def_cfa 7, 8
        ret 
        .cfi_endproc
    .LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 11.2.0-7ubuntu2) 11.2.0"
        .section    .note.GNU-stack,"",@progbits
        .section    .note.gnu.property,"a"
        .align 8
        .long   1f - 0f
        .long   4f - 1f
        .long   5
    0:
        .string "GNU"
    1:
        .align 8
        .long   0xc0000002
        .long   3f - 2f
    2:
        .long   0x3
    3:
        .align 8
    4:
user786
  • 3,902
  • 4
  • 40
  • 72
  • Which command in particular "caused that assembly to appear"? – AKX Feb 23 '22 at 06:48
  • @AKX I compiled my program with `# gcc -save-temps -fverbose-asm test.c -o b.o ` – user786 Feb 23 '22 at 06:51
  • 3
    Funny, by the way, that googling `LFB0`, the "complete BS that brought zero google search results" leads to https://stackoverflow.com/questions/15284947/understanding-gcc-s-output as the first result. ;-) – AKX Feb 23 '22 at 06:51
  • 2
    Also, by the way, your "simple program" is an incorrect C program to boot, since `main()` must return an `int`. – AKX Feb 23 '22 at 06:55
  • @AKX `Those .cfisomething directives result in generation of additional data by the compiler. This data helps traverse the call stack when an instruction causes an exception, so the exception handler (if any) can be found and correctly executed. The call stack information is useful for debugging` I could not understood who defined this LFB0 section in my processor I mean is this predefined section for x86-064 ? – user786 Feb 23 '22 at 06:55
  • @AKX assume the thing that calls my main not expecting anything in return – user786 Feb 23 '22 at 06:56
  • The linked answer says "these assembler directives" about `cfi`... they aren't markers the CPU would care about. – AKX Feb 23 '22 at 06:57
  • @AKX can `these assembler directives` make sense to my processor. How? – user786 Feb 23 '22 at 06:57
  • @AKX u mean they dont get executed by processor – user786 Feb 23 '22 at 06:58
  • 1
    Assembly doesn't get executed by the processor. If you were to compile the assembly to binary, then disassemble it, the cfi markers will likely have disappeared. – AKX Feb 23 '22 at 06:59
  • @AKX ok good how to generate assembly from a.out or executable without these directives – user786 Feb 23 '22 at 07:00
  • 1
    See `objdump`, e.g. https://unix.stackexchange.com/questions/343013/how-objdump-disassemble-elf-binary – AKX Feb 23 '22 at 07:03
  • @AKX ok I read it so I like to know are `sections` can be understood by only dessemblers??? That means my processor only understands `only listed and vendor published instructions like movq or movb` to move values from source and destination and understands only listed instructions and data – user786 Feb 23 '22 at 07:18
  • 3
    Your processor handles only binary instruction codes and data. Sections, segments etc are conventions that allow assemblers and linkers to generate an 'executable' file with header dara that the OS program loader can use to create an environment for the binary to be run. – Martin James Feb 23 '22 at 07:44
  • 1
    You only need `.cfi` directives if you want stack-unwinding for exceptions and backtraces to work through your asm functions. e.g. C++ calls your asm, your asm calls another C++ function, that function does a `throw` of a C++ exception, and the parent of your function should catch it. The generated metadata goes in the `.eh_frame` section of your ELF object file / executable, and is data read by the exception handling machinery. – Peter Cordes Feb 23 '22 at 08:11
  • 2
    The `.zero 1` at the start of one of the code blocks in this answer is the actual space for `char a;`! You showed all the metadata like `.size` and `.globl`, but you omitted the actual `a: .zero 1` to reserve space for it in the `.bss` section the compiler switched to with `.bss`. All of those directives are listed in the GAS manual, e.g. https://sourceware.org/binutils/docs/as/Zero.html – Peter Cordes Feb 23 '22 at 08:13
  • @PeterCordes so u are saying cfi directives and .zero are important even if I am compiling my assembly in nasm – user786 Feb 23 '22 at 08:15
  • The `.zero 1` (or NASM `resb 1`) is absolutely essential! That's part of the bare minimum for your code to work at all, along with the `a:` label. The CFI stuff is only useful if you want exception handling to be able to unwind the stack through your function; most tutorials and a lot of real-world hand-written asm doesn't bother with that. – Peter Cordes Feb 23 '22 at 08:17
  • NASM uses different directives than GAS, so IDK why you're bringing that up for GCC output using GAS syntax. You didn't even use `-masm=intel` to make the instruction syntax similar. (Although it's still different from NASM for RIP-relative addressing, and for labels as memory operands vs. immediates.) – Peter Cordes Feb 23 '22 at 08:19
  • @PerterCordes thanks but where do I place `resb 1` instruction is it at same place and replace `.zero 1` with `.resb 1`? – user786 Feb 23 '22 at 08:33
  • 1
    NASM has a manual: https://nasm.us/doc/nasmdoc3.html#section-3.2.2 and there are tons of tutorials and existing Q&As. Plenty of stuff should turn up if you search on "nasm resb". Also it's *not* `.resb`, it's `resb`. NASM directives don't use a leading `.` – Peter Cordes Feb 23 '22 at 08:51
  • @PeterCordes thanks will look into that. Which assembly instruction does nasm covers? Which machine – user786 Feb 23 '22 at 09:03
  • 3
    PLEASE read the manuals. – the busybee Feb 23 '22 at 09:06
  • `int main(){};` is an error; the `{}` completes the function definition, so `;` forms a separate empty declaration, which is improper in modern C. Did you mean `int main();`? – Eric Postpischil Feb 23 '22 at 14:00
  • 1
    `void main(){};` is an error. The return type should be `int`, but, more particularly for this question, the `;` is a problem. The {} completes the function definition, so `;` forms a separate empty declaration, which is improper in modern C. Did you mean `int main();`? – Eric Postpischil Feb 23 '22 at 22:37

1 Answers1

4

.text is a directive that tells the assembler to start a program code section (the “text” section of the program, a read-only executable section containing mostly instructions to be executed). It is here because GCC without optimization always puts a .text at the top of the file, even if it's about to switch to another section (like .bss in this case) and then back to .text when it's ready to emit some bytes into that section (in your case, a definition for main). GCC does still parse the whole compilation unit before emitting any asm, though; it's not just compiling one global variable / function at a time as it goes along.

.globl a is a directive that tells the assembler that a is a “global” symbol, so its definition should be listed as an external symbol for the linker to link with.

.bss is a directive that tells the assembler to start the “block starting symbol” section (which will contain data that is initialized to zero or, on some systems, mostly older, is not initialized).

.type a @object and .size a, 1 are directives that describe the type and size of an object named a. The assembler adds this information to the symbol table or other information in the object file it outputs. It is useful for debuggers to know about the types of objects.

a: is label. It acts to define the symbol. As the assembler reads assembly, it counts bytes in the section it is current generated. Each data declaration or instruction takes up some bytes, and the assembler counts those. When it sees a label, it associates the label with the current count. (This is commonly called the program counter even when it is counting data bytes.) When the assembler writes information about a to the symbol table, it will include the number of bytes it is from the beginning of the section. When the program is loaded into memory, this offset is used to calculate the address where the object a will be in memory.

So the question is why a: is at the bottom

a: must be after .bss because a will be put into the section the assembler is currently working on, so that needs to be set to the desired section before declaring the label. The location of a relative to the other directives might be flexible, so that reordering them would have no consequence.

so I like to know is code 1 and code 2 same?

No, a: must appear after .bss so that it is put into the correct section.

.zero 1 says to emit 1 zero byte in the current section. Like (almost?) all directives GCC uses, it's well documented in the GNU assembler manual: https://sourceware.org/binutils/docs/as/Zero.html

so again have my gcc place main in .zero

No, .text starts (or switches back to) the code section, so main will be in the code section.

is .LFB0: a some section of program that my x86-64 processor can run

Anything ending with a colon is a label. .LFB0 is a local label the compiler is using in case it needs it as a jump or branch target.

so I like to know if I am coding in assembly can I ignore .cfi_startproc line.

When writing assembly for simple functions without exception handling and related features, you can ignore .cfi_startproc and other call-frame information directives that generate metadata that goes in the .eh_frame section. (Which is not executed, it's just there as data in the file for exception handlers and debuggers to read.)

… if not needed then I can assume my program will become…

If you are omitting some of the .cfi… directives, I would omit all of them, unless you look into what they do and determine which ones can be omitted selectively.

I believe I have to save it with .s or .S extension which one s small or large S?

With GCC and Clang, assembly files ending in .S are processed by the “preprocessor” before assembly, and assembly files ending in .s are not. This is the preprocessor familiar from C, with #define, #if, and other directives. Other tools may not do this. If you are not using preprocessor features, it generally does not matter whether you use .s or .S.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 1
    *your source code in this case does not contain any functions, so there is no executable code* - actually the compilation unit shown in the question does contain a `main(){}`. It comes after a global variable declaration, though. – Peter Cordes Feb 23 '22 at 13:19
  • @PeterCordes: Thanks, fixed the `.s` versus `.S` sense. The `main` line seems to be in error; `int main(){};` would be a definition followed by an empty declaration, which is ungrammatical in modern C. I asked OP to clarify. – Eric Postpischil Feb 23 '22 at 14:01
  • `int main(){}` is fully valid in ISO C99 and later, thanks to the implicit `return 0;`. The parameter list is left unspecified, which is also valid if you don't want to bother with `(void)` or `(int argc, char **argv)`. Proof: https://godbolt.org/z/bvMKrfTjn - GCC `-Wall` doesn't complain unless you use `-std=gnu89`. The default is `-std=gnu11` with modern GCC. – Peter Cordes Feb 23 '22 at 22:30
  • @PeterCordes: `int main(){}` would not be a problem. The `;` after `{}` is. Also, OP has `void main(){};`, not `int main(){};`. – Eric Postpischil Feb 23 '22 at 22:35
  • Oh right yes, the return type is a problem for a hosted implementation; only freestanding can usually use `void main(){}`. A stray semicolon is not a syntax error there, though. Empty statements are allowed at any scope including global. https://godbolt.org/z/d31E58hWe shows `gcc -Wall -ffreestanding` compiling it without warnings. Or just a warning without freestanding. So it's 100% believable that they compiled that exact source to get the asm output they show, like on Godbolt without filters: https://godbolt.org/z/E7WrKv81r – Peter Cordes Feb 23 '22 at 22:59
  • 1
    @PeterCordes: Compiling with `-std=c99 -pedantic` produces a warning. There is no *statement* in the grammar outside functions, just *declaration* and *function-definition* grammar tokens. Neither of those can be empty; a *declaration* must have *declaration-specifiers* or a *static_assert-declaration*, and neither of those can be empty. In any case, OP’s source code with a function definition conflicts with their assembly showing an empty text section, so I suspect they made a mistake presenting the code, and the question should be edited to clarify that. – Eric Postpischil Feb 23 '22 at 23:03
  • Huh, thanks for the correction, I hadn't realized empty `;` weren't 100% legal at global scope in C. Re: the initial empty `.text`: looks like GCC just puts that at the top of the file, even if it's not going to emit anything into that section before switching to another one. (And then back to `.text` later). The godbolt link in my previous comment shows `gcc -O0` output for this code, which has that happening. https://godbolt.org/z/YnEqb1x5e shows `-O0 -g0` to avoid the clutter of directives to emit debug metadata – Peter Cordes Feb 23 '22 at 23:11