9

Here is an example found via an assembly website. This is the C code:

 int main()
 {
     int a = 5;
     int b = a + 6;
     return 0;
 }

Here is the associated assembly code:

    (gdb) disassemble
    Dump of assembler code for function main:
    0x0000000100000f50 <main+0>:    push   %rbp
    0x0000000100000f51 <main+1>:    mov    %rsp,%rbp
    0x0000000100000f54 <main+4>:    mov    $0x0,%eax
    0x0000000100000f59 <main+9>:    movl   $0x0,-0x4(%rbp)
    0x0000000100000f60 <main+16>:   movl   $0x5,-0x8(%rbp)
    0x0000000100000f67 <main+23>:   mov    -0x8(%rbp),%ecx
    0x0000000100000f6a <main+26>:   add    $0x6,%ecx
    0x0000000100000f70 <main+32>:   mov    %ecx,-0xc(%rbp)
    0x0000000100000f73 <main+35>:   pop    %rbp
    0x0000000100000f74 <main+36>:   retq   
    End of assembler dump.

I can safely assume that this line of assembly code:

  0x0000000100000f6a <main+26>:   add    $0x6,%ecx

correlates to this line of C:

     int b = a + 6;

But is there a way to extract which lines of assembly are associated to the specific line of C code?
In this small sample it's not too difficult, but in larger programs and when debugging a larger amount of code it gets a bit cumbersome.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Unhandled Exception
  • 1,427
  • 14
  • 30
  • 4
    Which compiler are you using? If using GCC, there is an option to output annotated assembly with comments mapping it to the C-code. I'm sure clang offers a similar solution. – Morten Jensen Jun 21 '17 at 11:37
  • See [this](https://stackoverflow.com/a/137479/1870232) – P0W Jun 21 '17 at 11:42
  • Yes. Compile with debug symbols and then look at it in the debugger again. – David Hoelzer Jun 21 '17 at 11:58
  • Possible duplicate of [How do you get assembler output from C/C++ source in gcc?](https://stackoverflow.com/questions/137038/how-do-you-get-assembler-output-from-c-c-source-in-gcc) – Cody Gray - on strike Jun 21 '17 at 12:34
  • compile with the -ggdb3 option. Then all the line number, etc information is available in the object file. – user3629249 Jun 21 '17 at 20:23

4 Answers4

7

But is there a way to extract which lines of assembly are associated to the specific line of C code?

Yes, in principle - your compiler can probably do it (GCC option -fverbose-asm, for example). Alternatively, objdump -lSd or similar will disassemble a program or object file with source and line number annotations where available.

In general though, for a large optimized program, this can be very hard to follow.

Even with perfect annotation, you'll see the same source line mentioned multiple times as expressions and statements are split up, interleaved and reordered, and some instructions associated with multiple source expressions.

In this case, you just need to think about the relationship between your source and the assembly, but it takes some effort.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • 1
    Yes, and it gets very worse (for you human) as soon as you turn optimizations on. – edmz Jun 21 '17 at 12:12
3

One of the best tools I've found for this is Matthew Godbolt's Compiler Explorer.

It features multiple compiler toolchains, auto-recompiles, and it immediately shows the assembly output with colored lines to show the corresponding line of source code.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
lapinozz
  • 190
  • 2
  • 6
1

First, you need to compile the program keeping inside its object file informations about the source code either via gdwarf or g flag or both. Next, if you want to debug it is important for the compiler to avoid optimizations, otherwise it is difficult to see a correspondence code<>assembly.

gcc -gdwarf -g3 -O0 prog.c -o out

Next, tell the disassembler to output the source code. The source flag involves the disassemble flag.

objdump --source out
alinsoar
  • 15,386
  • 4
  • 57
  • 74
0

@Useless is very right. Anyways, a trick to know where C has arrived in the machine code is to inject markers in it; for instance,

#define ASM_MARK do { asm __volatile__("nop; nop; nop;\n\t" :::); } while (0);

int main()
{
    int a = 5;
    ASM_MARK;
       int b = a + 6;
    ASM_MARK;
    return 0;
}

You will see:

main:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    $5, -4(%rbp)
        nop; nop; nop;

        movl    -4(%rbp), %eax
        addl    $6, %eax
        movl    %eax, -8(%rbp)
        nop; nop; nop;

        movl    $0, %eax
        popq    %rbp
        ret

You need to use the __volatile__ keyword or equivalent in order to tell the compiler not to interfere and this is often compiler-specific (notice the __), as C does not provide this kind of syntax.

edmz
  • 8,220
  • 2
  • 26
  • 45
  • 2
    This is very Gnu compiler-specific, and may also interfere with the optimizer, so while it is a trick to be aware of, it is also one to wield with caution. There are much better ways. – Cody Gray - on strike Jun 21 '17 at 12:35
  • @CodyGray: That is a good point. I've hinted even further the fact that nothing is touched or produced, so there will be very likely no data saving instructions around the asm at all. – edmz Jun 21 '17 at 14:59
  • 3
    I understand what you are thinking, and it does *seem* that inserting some `NOP`s with no input/output operands would have no effect. But, see [here](https://stackoverflow.com/questions/13955162/why-does-adding-assembly-comments-cause-such-radical-change-in-generated-code) for a demonstration of how the insertion of a *comment* using inline assembly can disrupt vectorization of code. – Cody Gray - on strike Jun 21 '17 at 15:43