9

I have the assembly code of some code that will be executed at a point in the program. I don't know the address of the code in memory.

Is it possible to make gdb break when the current instruction matches with an inputted instruction?

For example I want gdb to break whenever gdb reaches this instruction:

leaq        0x000008eb(%rip),%rax
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Tyilo
  • 28,998
  • 40
  • 113
  • 198

3 Answers3

10

As others said, it is likely impossible to do it efficiently because there is no hardware support.

But if you really want to do it, this Python command can serve as a starting point. The program single steps until a function with given opcode is given. Since execution returns to python for every instruction, this will be unusably slow slow if the target instruction is too many instructions away. But for a more or less nearby call, it should work fine.

class ContinueI(gdb.Command):
    """
Continue until instruction with given opcode.

    ci OPCODE

Example:

    ci callq
    ci mov
"""
    def __init__(self):
        super().__init__(
            'ci',
            gdb.COMMAND_BREAKPOINTS,
            gdb.COMPLETE_NONE,
            False
        )
    def invoke(self, arg, from_tty):
        if arg == '':
            gdb.write('Argument missing.\n')
        else:
            thread = gdb.inferiors()[0].threads()[0]
            while thread.is_valid():
                gdb.execute('si', to_string=True)
                frame = gdb.selected_frame()
                arch = frame.architecture()
                pc = gdb.selected_frame().pc()
                instruction = arch.disassemble(pc)[0]['asm']
                if instruction.startswith(arg + ' '):
                    gdb.write(instruction + '\n')
                    break
ContinueI()

Just source it with:

source gdb.py

and use the command as:

ci mov
ci callq

and you will be left on the fist instruction executed with a given opcode.

TODO: this will ignore your other breakpoints.

For the particular common case of syscall, you can use catch syscall: https://reverseengineering.stackexchange.com/questions/6835/setting-a-breakpoint-at-system-call

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • *"it is likely impossible to do it efficiently because there is no hardware support...."* - there's no hardware support to break on a function name, but GDB manages to do it. – jww Aug 21 '15 at 00:00
  • @jww strictly speaking you are right, but I think it is clear what I mean: GDB could of course parse the entire text section and put software breakpoints on all those opcodes, and that is easy to do in Python. Or you can single step like I'm doing. But that is going to take much longer than going through the symbol table to find a few functions and put breakpoints there. Also, there will be thousands of opcodes vs one function name, so execution is going to be slow even if you put breakpoints previously. – Ciro Santilli OurBigBook.com Aug 21 '15 at 06:48
  • @jww Also I think this is worth mentioning because it would be feasible to have hardware support for it since the processor already parses opcodes. But to hardware support function breakpoints, would require an ELF parsing processor :-) – Ciro Santilli OurBigBook.com Aug 21 '15 at 06:51
  • 1
    @CiroSantilli烏坎事件2016六四事件法轮功 Your script gives me: `TypeError: super() takes at least 1 argument (0 given)` in gdb. – JohnnyFromBF Oct 22 '16 at 16:43
  • @JohnnyFromBF possibly Python 3 vs Python 2 problem? – Ciro Santilli OurBigBook.com Oct 22 '16 at 17:07
  • I'm on 2.7.9. Guess I'd have to recompile gdb in order to change it according to [this](http://stackoverflow.com/questions/26243956/how-to-change-the-python-interpreter-that-gdb-uses). Maybe you can reproduce it. – JohnnyFromBF Oct 22 '16 at 17:13
  • 1
    You just have to replace `super().__init__(` with `super(ContinueI, self).__init__` and it works. – JohnnyFromBF Oct 28 '16 at 08:18
  • "use the command as" tells you to write "breaki", but the code registers itself as "ci". To run it you need to write ci, not breaki – S. Kaczor Dec 01 '21 at 01:27
3

I don't know the address of the code in memory.

What prevents you from finding that address? Run objdump -d, find the instruction of interest, note its address. Problem solved? (This is trivially extended to shared libraries as well.)

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • 1
    It is a QuickLook plugin, so I don't know how it's loaded and called. – Tyilo Dec 25 '12 at 16:12
  • 1
    What prevents GDB from doing this for us? Computers should make our lives easier, not harder :) – jww Aug 20 '15 at 23:57
2

No, this is not possible and it would also be very inefficient to implement.

Debugger's typically support two kinds of breakpoints:

  • Hardware Breakpoints: The debugger asks the CPU to raise a special exception interrupt when some event occurs, like some location in memory is changed.
  • Software Breakpoints: The debugger replaces the opcode at the breakpoint's address with a special "trap" instruction (int 3 / 0xcc on the x86 architecture).

Matching the current instruction's opcode would either require CPU support to insert a hardware breakpoint or the debugger needs to know the address to use a software breakpoint.

In theory, the debugger could just search the entire memory for the instruction's byte sequence, but since the byte sequence could also occur in the middle of an instruction or in data, it may get false positives.

Since assembly instructions are variable-length, control could jump to any arbitrary address or code could modify itself, it's also not trivial to disassemble an entire region of memory to find some particular instruction.

So basically, the only way of reliably finding the instruction in arbitrary assembly code would be by single-stepping on the instruction level. And this would be extremely expensive, even a trivial library call such as printf() could take minutes on today's hardware if you single-step every instruction.

Martin Baulig
  • 3,010
  • 1
  • 17
  • 22
  • *"... and it would also be very inefficient to implement."* - I'm not sure about this. A naive implementation may be inefficient, like string comparing each mnemonic when executed. But asking GDB to do what Employed Russian suggested seems reasonable. In my case, I want to break on calls to `CPUID`. There's only four or five calls to it, so it seems like GDB doing what Employed Russian suggested would be perfect for me so I don't have to waste the time. – jww Aug 20 '15 at 23:57
  • for example on most ARM architectures (ignoring Thumb) instruction lengths are always 4 bytes long – Andre Holzner Jun 03 '20 at 11:03