6

I was wondering how debuggers are able to step line by line through the source code.

Once the source code is compiled and I run the program, how does a debugger know the correspondence between the machine-level instructions and the higher-level statements?

For example, if I set a break-point on one line in my source file, how does it know which machine-level instruction to stop at?

sudoer123
  • 89
  • 4
  • 2
    The compiler can emit debug information if requested, e.g. using `gcc -g`. If it's not present (either because it wasn't there to start with or was later stripped) then the debugger won't be able to map source code. – Jester May 14 '19 at 18:16
  • 2
    The compiler keeps notes. – Seva Alekseyev May 14 '19 at 18:22
  • understand that optimized code doesnt have a one to one / one to group relationship with the high level language. it is for the most part smoke and mirrors, notes are taken as described in the answers, but the reality is you are not stepping through "lines" of C code, it is more of a rough estimate. Ideally you want to step through the machine code/asm if you feel the need to use a debugger. – old_timer May 14 '19 at 18:50
  • also note that each toolchain is different, so at best this would be a gcc answer due to you tagging gcc. other than not being a stackoverflow question anyway, it is also too broad. – old_timer May 14 '19 at 18:51

2 Answers2

4

Look at asm output from gcc -g -S and you'll see .line debug-info directives and so on for the block of asm corresponding to the C source line.

(With optimization enabled the same line can map to multiple non-contiguous instructions, so it gets much trickier, but compilers still try to be useful and map most instructions to some source line even if they're really the result of optimization and doing an operation that doesn't appear in the source...).

https://godbolt.org/ uses the same debug info as debuggers do, but uses it for color highlighting to match source lines with asm.


When an assembler assembles these .line directives, it creates debug info in the .o object file, which is eventually linked into an executable or library. Or split into a separate debug-symbols file. Or stripped.

It's this debug-info that debuggers read.

(Debug info also includes info about which named C variables are stored where, and what their types are. For locals, the locations are relative to the stack frame for the function that contains them.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • WOW! This `goodbolt` reference! I had no idea such incredible tool existed! Only randomly found it through your answer. I guess now there is no other option for me but to dive in the depths of assembler... – ScienceDiscoverer Aug 04 '22 at 07:25
  • @ScienceDiscoverer: Matt Godbolt (who created that site) gave a talk at CppCon2017, [“What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid](https://youtu.be/bSkpMdDe4g4)”, so that and [How to write hello world in assembler under Windows?](https://stackoverflow.com/q/1023593) are a good starting point for diving in. – Peter Cordes Aug 04 '22 at 07:30
3

The compiler & linker can produce so called debug symbols that contains this information. The debug information contains the source file/line to address mapping, the addresses of global variables and beginning addresses of all the functions and their local variable offsets on the stack. In the case of gcc, the -g compiler option does this. The debug symbol information can be embedded in the executable program as typically is the case with gcc or in separate symbol files (.pdb files with msvc).

Sami Sallinen
  • 3,203
  • 12
  • 16