1

I was lurking around my OS textbook and it mentioned that virtual address translation can be implemented on data breakpoint (for program debugging). I only know that the debugger uses INT 3 to pause the program, local and global variables being processed in someway in the debug control & address registers. But after some digging I only found information regarding linear address in the use of debug register. No articles or discussions about the mechanism behind virtual address related data breakpoint at all. So how exactly this work?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Travis Su
  • 670
  • 2
  • 7
  • 17
  • It's unclear what exactly you mean but you could unmap or protect a page containing an address of interest so you'd get a page fault on access that you can then handle in your debugger. – Jester Apr 15 '20 at 00:33
  • One linear address can have multiple different virtual addresses. Any program that has sufficient privilege to access the debug registers (DR0-DR5 on the '386) should be able to determine the linear address that corresponds to a virtual address that is to be monitored. – 1201ProgramAlarm Apr 15 '20 at 01:01
  • @Jester I just wanna understand how debugger works using virtual address instead of linear address. And the relation between data breakpoint and local/global variables. see https://pdos.csail.mit.edu/6.828/2008/readings/i386/s12_02.htm#fig12-1 and https://pdos.csail.mit.edu/6.828/2008/readings/i386/s12_03.htm .as I am too green for this – Travis Su Apr 15 '20 at 01:02
  • @1201ProgramAlarm I read from here https://pdos.csail.mit.edu/6.828/2008/readings/i386/s05_02.htm states that "A linear address refers indirectly to a physical address by specifying a page table" so that actually means when linear address being translate to physical address, the actual translation is a corresponding virtual address to physical address? – Travis Su Apr 15 '20 at 01:05

1 Answers1

5

Linear addresses are virtual, in x86 terminology. x86 memory addressing goes:

  • addressing mode like [ebp + eax*4] to "effective address" (the offset part of a seg:off). (And every addressing mode implies a segment, if you don't manually override with [fs: rdi] for example. Normally DS, unless the base register is R/E/BP or R/ESP in which case SS. Or for implicit addressing modes as part of e.g. push rax or stosb, it depends on the instruction.)
  • seg:off -> linear by adding the segment base to the offset.
  • translation of that linear address to physical. (And if virtualizing, from guest-physical to true physical.)

All steps are done by the CPU hardware, first using the segment base, then using the page-table pointed to by CR3. Or the TLB which caches the translations from that page table.

The hardware debug registers for hardware breakpoints / watchpoints use virtual addresses. https://en.wikipedia.org/wiki/X86_debug_register explains it as follows:

The addresses in these registers are linear addresses. If paging is enabled, the linear addresses are translated into physical addresses by the processor's paging mechanism. If paging is not enabled, these linear addresses are the same as physical addresses.

That implies that a watchpoint can trigger when you access the same physical address from a different virtual address than the one you put in the debug register. (If that description on Wikipedia is accurate; I'd test it and/or check Intel or AMD's manuals if that matters.)

I don't actually know the details; know x86 has a TF flag and debug registers, and a general idea of things they can do, but I've never written code to use them.


I only know that the debugger uses INT 3 to pause the program

"hardware breakpoint" means the CPU will stop without software having to rewrite the executing code to 0xCC int3. The debug registers can do this, and also detect access to certain memory locations by any instruction.

So you can set a watchpoint to break when anything your program reads or writes a certain global variable in memory, letting you find code that modifies it through a pointer or something. And since it's HW supported, you can run at full speed instead of having to single-step and have software check every access.

See also

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • You sir absolutely cleared somethings for me. I also wanna know how things work when I set a data breakpoint on global variables vs local variables. I read that the process was slightly different on the register side but none of them explains further on that matter. – Travis Su Apr 15 '20 at 01:11
  • @TravisSu: Are you asking about local vars optimized into a register? You can't set a watchpoint on a register, only a memory address. That's why debug builds make sure a variable always has an address, like in the C abstract machine: [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394). When you're debugging a specific instance of a running program, the debugger knows the address of that instance of a local var that you're setting a breakpoint on. – Peter Cordes Apr 15 '20 at 01:16
  • Not exactly that. For example I am programming in x86 assembly language, local variables are usually referenced by offset in EBP register, but I am not sure about global variables. And I wanted to know how data breakpoint works on these occasions. – Travis Su Apr 15 '20 at 01:29
  • when set a data breakpoint on a local variable, there is a chance that a different function will use EBP register to point to their frame stack, the debugger need to do something to avoid that? – Travis Su Apr 15 '20 at 01:31
  • @TravisSu: Debug registers don't care *what* the address is; they trigger based on matching linear address, after address generation and seg:off to linear. See the top of my answer. To *program* the debug regs appropriately, a debugger just needs to calculate what the address of a local variable is (using debug-info metadata and the current value of EBP or ESP, depending on whether the function was optimized to not use EBP as a traditional frame pointer.) – Peter Cordes Apr 15 '20 at 01:33
  • @TravisSu: like I said, one specific local variable in one function's stack frame will have an address that nothing else is using for the lifetime of that variable. If you leave a watchpoint set after a function returns, though, the next function to reuse that stack space for something might trigger the watchpoint. (In the call tree, that might be a *sibling* of the one where you set the watchpoint. Or a parent, if the parent uses `alloca` or allocates a variable-length array after a function call returns.) – Peter Cordes Apr 15 '20 at 01:35