6

I am a student and just started studying assembly language. To understand it better I just wrote a short in C and converted it to assembly language. Surprisingly I didn't understand a bit.

The code is:

#include<stdio.h>

int main()
{
    int n;
    n=4;
    printf("%d",n);
    return 0;
}

And the corresponding assembly language is:

.file   "delta.c"
    .section    .rodata
.LC0:
    .string "%d"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    movl    $4, 28(%esp)
    movl    $.LC0, %eax
    movl    28(%esp), %edx
    movl    %edx, 4(%esp)
    movl    %eax, (%esp)
    call    printf
    movl    $0, %eax
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits

What do these mean?

Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192
Nithin Jose
  • 1,029
  • 4
  • 16
  • 31
  • 11
    Which part is giving you trouble? We can't explain every line, if you're at that level you need to start by reading a book and not by jumping directly into something incomprehensible. Tell us which parts you understand and which parts you don't understand. – Gilles 'SO- stop being evil' Jul 22 '13 at 18:18
  • 5
    There are a few major concepts in your example which would get lost in the process of explaining instructions line by line. If you have little or no understanding of assembly instructions, you should get a book or some online material starting with the basics. Once you are familiar with how the instructions work, then the bigger concepts of managing a stack frame, registers/memory, and functional calling conventions can be covered. – lurker Jul 22 '13 at 18:24
  • Actually I don't know much about assembly languages , all I know is some mov, add etc. I better choose your opinion. – Nithin Jose Jul 22 '13 at 18:29
  • Probably just the stack frame manipulation stuff making you sad. Starting with movl $4, 28(%esp) it should look pretty familiar (when comparing against source). – Brian Knoblauch Jul 22 '13 at 19:51
  • You can get a listing file that shows the combined C and assembly language from this command: `gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst` The combined listing will be in foo.lst. [Source](http://www.delorie.com/djgpp/v2faq/faq8_20.html) – Jim Mischel Jul 22 '13 at 19:53
  • If you want to do yourself a favor, start with MIPS assembler. – Raphael Jul 22 '13 at 19:58
  • Voting to close as too broad. – Ciro Santilli OurBigBook.com Nov 18 '15 at 20:25
  • The only reasons I'm not downvoting and voting to close is that this question has a useful answer which might make a good duplicate target or reference. Questions like "I have no idea about any of this stuff, please write me a walkthrough" are not appropriate. – Peter Cordes Aug 04 '16 at 03:09

2 Answers2

31

Let's break it down:

.file   "delta.c"

The compiler is using this to tell you the source file that the assembly came from. It doesn't mean much to the assembler.

.section    .rodata

This starts a new section. "rodata" is the name for the "read-only data" section. This section ends up writing data to the executable that gets memory mapped in as read-only data. All the ".rodata" pages of an executable image end up being shared by all the processes that load the image.

Generally any "compile-time-constants" in your source code that can't be optimized away into assembly intrinsics will end up being stored in the "read only data section".

.LC0:
    .string "%d"

The .LC0" part is a label. It provdes a symbolic name that references the byes that occur after it in the file. In this case "LC0" represents the string "%d". The GNU assembler uses the convention that labels that start with an "L" are considered "local labels". This has a technical meaning that is mostly interesting to people who write compilers and linkers. In this case it's used by the compiler to refer to a symbol that is private to a particular object file. In this case it represents a string constant.

.text

This starts a new section. The "text" section is the section in object files that stores executable code.

.globl  main

The ".global" directive tells the assembler to add the label that follows it to the list of labels "exported" by the generated object file. This basically means "this is a symbol that should be visible to the linker". For example a "non static" function in "C" can be called by any c file that declares (or includes) a compatible function prototype. This is why you can #include stdio.h and then call printf. When any non-static C-function is compiled, the compiler generates assembly that declares a global label that points at the beginning of the function. Contrast this with things that shouldn't be linked, such as string literals. The assembly code in the object file still needs a label to refer to the literal data. Those are "local" symbols.

.type   main, @function

I don't know for sure how GAS (the gnu assembler) processes ".type" directives. However, this instructs the assembler that the label "main" refers to executable code, as opposed to data.

main:

This defines the entry point for your "main" function.

.LFB0:

This is a "local label" that refers to the start of the function.

    .cfi_startproc

This is a "call frame information" directive. It instructs the assembler to emit dwarf format debugging information.

    pushl   %ebp

This is a standard part of a function "prologue" in assembly code. It's saving the current value of the "ebp" register. The "ebp" or "base" register is used to store the "base" of the stack frame within a function. Whereas the "esp" ("stack pointer") register can change as functions are called within a function, the "ebp" remains fixed. Any arguments to the function can always be accessed relative to "ebp". By ABI calling conventions, before a functon can modify the EBP register it must save it, so that the original value can be restored before the function returns.

    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8

I haven't investigated these in detail, but I believe they are related to DWARF debugging information.

    movl    %esp, %ebp

GAS uses AT&T syntax, which is backwards from what the Intel manual uses. This means "set ebp equal to esp". This basically establishes the "base pointer" for the rest of the function.

    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp

This is also part of the epilouge for the function. This aligns the stack pointer, and then subtracts enough room from it to hold all the locals for the function.

    movl    $4, 28(%esp)

This loads the 32 bit integer constant 4 into a slot in the stack frame.

    movl    $.LC0, %eax

This loads the "%d" string constant defined above into eax.

    movl    28(%esp), %edx

This loads the value "4" stored in offset 28 in the stack to edx. Chances are your code was compiled with optimizations turned off.

    movl    %edx, 4(%esp)

This then moves the value 4 onto the stack, in the place it needs to be when calling printf.

    movl    %eax, (%esp)

This loads the string "%d" into the place on the stack it needs to be when calling printf.

    call    printf

This calls printf.

    movl    $0, %eax

This sets eax to 0. Given that the next instructions are "leave" and "ret", this is equavlent to "return 0" in C code. The EAX register is used to hold your function's return value.

    leave

This instruction cleans up the call frame. It sets ESP back to EBP, then pops EBP out of the modified stack pointer. Like the next instruction this is part of the function's epilogue.

    .cfi_restore 5
    .cfi_def_cfa 4, 4

This is more DWARF stuff

    ret

This is the actual return instruction. It returns from the functon

    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits
Scott Wisniewski
  • 24,561
  • 8
  • 60
  • 89
  • 2
    Of course, on a modern operating system, every executable (to a first approximation) is a shared object. So `rodata` isn't just shared between different programs using a shared library, it's also shared between different instances of the same program, if multiple instances happen to be running at the same time. Check out John Levine's fine book on the subject of linking and loading for further details. http://www.iecc.com/linker/ – Pseudonym Jul 24 '13 at 02:01
  • Yes.I know this. Thanks. I should have been more precise in my language. Thanks for pointing it out. – Scott Wisniewski Jul 24 '13 at 02:55
  • 1
    if you want to know what high level source code is responsible for all the assembly generated you can always use following command to inspect : gcc -Wa,-adhln=delta.lst -g delta.c – Puttaraju Apr 23 '14 at 07:48
  • The phrasing of compile-time constants being "optimized away into assembly intrinsics" leaves something to be desired. "intrinsics" has a specific technical meaning (function-like things in C, like `_mm_popcnt_u32`) that isn't appropriate here. I don't have any great ideas that are totally accurate, though. Some constants will just end up as immediates in the instruction stream, while constant-propagation will turn other constants into removed branches or fully unrolled small loops. So it wouldn't be accurate to just say "for constants that don't compile into immediates". – Peter Cordes Aug 04 '16 at 02:58
  • The `.cfi_*` noise is stack-unwind info, used by exception handlers as well as debuggers. It's generated even without `-g`, and isn't removed by `strip`. It's needed for exception handlers to be able to unwind the stack through functions compiled with `-fomit-frame-pointer` (the default at `-O2`). Notice how there's a .cfi directive every time `%esp` changes, and directives indicating which register was saved at which point. But for humans reading the asm to see what it does, they're just noise. http://stackoverflow.com/questions/38552116/how-to-remove-noise-from-gcc-clang-assembly-output – Peter Cordes Aug 04 '16 at 03:03
2

For me, intels syntax is easier to read, learning how to generate intels syntax is handy for understanding C programs better;

gcc -S -masm=intel file.c

In windows your C program becomes;

    .file   "file.c"
    .intel_syntax noprefix
    .def    ___main;    .scl    2;  .type   32; .endef
    .section .rdata,"dr"
LC0:
    .ascii "%d\0"
    .text
    .globl  _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
LFB13:
    .cfi_startproc
    push    ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    mov ebp, esp
    .cfi_def_cfa_register 5
    and esp, -16
    sub esp, 32
    call    ___main
    mov DWORD PTR [esp+28], 4
    mov eax, DWORD PTR [esp+28]
    mov DWORD PTR [esp+4], eax
    mov DWORD PTR [esp], OFFSET FLAT:LC0
    call    _printf
    mov eax, 0
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
LFE13:
    .ident  "GCC: (rev2, Built by MinGW-builds project) 4.8.1"
    .def    _printf;    .scl    2;  .type   32; .endef

(the compiler options should be the same on ubuntu as in windows)

Apart from the psychotic labels, this is more like the assembly i read about in text books..

Here is a way of looking at it;

    call    ___main

    mov DWORD PTR [esp+28], 4  
    mov eax, DWORD PTR [esp+28]              ; int n = 4;

    mov DWORD PTR [esp+4], eax 
    mov DWORD PTR [esp], OFFSET FLAT:LC0
    call    _printf                          ; printf("%d",n);

    mov eax, 0
    leave                                    ; return 0;
James
  • 1,009
  • 10
  • 11
  • 1
    Yes, more textbooks use Intel format. I started off assembly programming (25ish years ago) in Intel format. Today, I find AT&T format easier to read apart from one case: the insane 386 indirect addressing modes. By any reckoning, `[ecx*2+12]` is easier to read than `12(,ecx,2)`. – Pseudonym Jul 24 '13 at 01:52
  • Ahh that's interesting, is it that AT&Ts style was more productive or did you just end up being around their syntax more then intels? Perhaps in time i will begin to prefer the AT&T way of things as well.. – James Jul 25 '13 at 18:08
  • I found it more productive, and mostly easier to read. The `DWORD PTR`s get to you after a while, in a COBOL kind of way, and x86-64 only makes it worse. – Pseudonym Jul 26 '13 at 01:16
  • Did `.global` come from nasm or Intel syntax in general? I always use `.global` with nasm (in Intel mode; does it even support at&t? I wouldn't use it for that anyways if it did), and if I was practicing at&t syntax, I'd use GAS with `.globl`. Should I always use `.global` with Intel and `.globl` with at&t regardless of assembler? For maximum portability? – RastaJedi Aug 04 '16 at 02:02
  • @RastaJedi: NASM and GAS use totally different directives, even though they support similar mnemonics and syntax when GAS is in `.intex_syntax noprefix` mode. NASM's `global` directive doesn't have a `.`. Obviously you should always use `global symbol_name` in NASM. For GAS, use `.globl` because that's what gcc does. IDK if it even supports `.global`. – Peter Cordes Aug 04 '16 at 03:08
  • Yeah I meant `global`, whoops. But yeah GAS supports `.global` and I've seen it sometimes. Cool, l just stick to using one for each. – RastaJedi Aug 04 '16 at 03:10