5

I'm trying to dynamically find the number of function called and returned of a program at runtime in x86_64 (intel syntax).

To do it I'm using ptrace (without the PTRACE_SYSCALL), and I'm checking RIP register (which contains the next instruction address) and I'm checking his opcode. I know that a function CALL can be found if LSB is equal to 0xE8 (according to Intel documentation, or http://icube-avr.unistra.fr/fr/images/4/41/253666.pdf page 105).

I found each instruction on http://ref.x86asm.net/coder64.html, So in my program, each time I found 0xE8, 0x9A, 0xF1, etc... I found a function entry (CALL or INT instruction), and if it's a 0xC2, 0XC3, etc... it's a function leave (RET instruction).

The goal is to find it on every program at runtime, I can't have access to the test program's compilation, instrumentation or use gcc's magic tools.

I made a little program who can be compiled with gcc -Wall -Wextra your_file.c and be launched by typing ./a.out a_program.

Here is my code:

#include <sys/ptrace.h>
#include <sys/signal.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

typedef struct user_regs_struct    reg_t;

static int8_t       increase(pid_t pid, int32_t *status)
{
        if (WIFEXITED(*status) || WIFSIGNALED(*status))
                return (-1);
        if (WIFSTOPPED(*status) && (WSTOPSIG(*status) == SIGINT))
                return (-1);
        if (ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL) == -1)
                return (-1);
        return (0);
}

int                 main(int argc, char *argv[])
{
    size_t          pid = fork();
    long            address_rip;
    uint16_t        call = 0;
    uint16_t        ret = 0;
    int32_t         status;
    reg_t           regs;

    if (!pid) {
            if ((status = ptrace(PTRACE_TRACEME, 0, NULL, NULL)) == -1)
                    return (1);
            kill(getpid(), SIGSTOP);
            execvp(argv[1], argv + 1);
    } else {
            while (42) {
                    waitpid(pid, &status, 0);
                    ptrace(PTRACE_GETREGS, pid, NULL, &regs);
                    address_rip = ptrace(PTRACE_PEEKDATA, pid, regs.rip, NULL);
                    address_rip &= 0xFFFF;
                    if ((address_rip & 0x00FF) == 0xC2 || (address_rip & 0x00FF) == 0xC3 ||
                        (address_rip & 0x00FF) == 0xCA || (address_rip & 0x00FF) == 0xCB ||
                        (address_rip & 0x00FF) == 0xCF)
                            ret += 1;
                    else if ((address_rip & 0x00FF) == 0xE8 || (address_rip & 0x00FF) == 0xF1 ||
                             (address_rip & 0x00FF) == 0x9A || (address_rip & 0x00FF) == 0xCC ||
                             (address_rip & 0x00FF) == 0xCD || (address_rip & 0x00FF) == 0xCF)
                            call += 1;
                    if (increase(pid, &status) == -1) {
                            printf("call: %i\tret: %i\n", call, ret);
                            return (0);
                    }
            }
    }
    return (0);
}

When I ran it with a_program (it's a custom program who simply enter in some local function and do some write syscall, the goal is just to trace the number of entered / left function of this program), No error occur, it's work fine, BUT I don't have the same number of CALL and RET. exemple:

user> ./a.out basic_program

call: 636 ret: 651

(The large number of call and ret is caused by LibC who goes into a lot of function before start your program, see Parsing Call and Ret with ptrace.)

Actually, it's like my program goes into more return than function call, but I found that 0xFF instruction is used for CALL or CALLF in (r/m64 or r/m16/m32), but also for other instruction like DEC, INC or JMP (who are very common instruction).

So, how can I differentiate it? according to http://ref.x86asm.net/coder64.html with the "opcode fields", but how can I found it?

If I add 0xFF into my condition:

else if ((address_rip & 0x00FF) == 0xE8 || (address_rip & 0x00FF) == 0xF1 ||
         (address_rip & 0x00FF) == 0x9A || (address_rip & 0x00FF) == 0xCC ||
         (address_rip & 0x00FF) == 0xCD || (address_rip & 0x00FF) == 0xCF ||
         (address_rip & 0x00FF) == 0xFF)
                call += 1;

If I launch it:

user> ./a.out basic_program

call: 1152 ret: 651

It seems normal for me, because it's count each JMP, DEC or INC, so I need to make a distinction between each 0xFF instruction. I tried to do like that:

 else if ((address_rip & 0x00FF) == 0xE8 || (address_rip & 0x00FF) == 0xF1 ||
         (address_rip & 0x00FF) == 0x9A || (address_rip & 0x00FF) == 0xCC ||
         (address_rip & 0x00FF) == 0xCD || (address_rip & 0x00FF) == 0xCF ||
         ((address_rip & 0x00FF) == 0xFF && ((address_rip & 0x0F00) == 0X02 ||
         (address_rip & 0X0F00) == 0X03)))
                call += 1;

But it gave me the same result. Am I wrong somewhere? How can I find the same number of call and ret?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Paul-Marie
  • 874
  • 1
  • 6
  • 24
  • Instead of ptrace you can compile your code with [instrumentation](https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html) – Eugene Sh. May 04 '18 at 16:52
  • The goal is to find it on every program at runtime, I can't have access to the test program's compilation, instrumentation or use gcc's magic tools. – Paul-Marie May 04 '18 at 16:56
  • Then it is nearly to impossible. – Eugene Sh. May 04 '18 at 16:59
  • The second byte of many instructions, including the FF opcode is a modr/m byte. The reg field is used as an extended opcode for FF, if it is 2 or 3, the instruction is a CALL instruction. That's what the `FF /2` opcode description in Intel's manual says. – fuz May 04 '18 at 17:01
  • @fuz Yep, it's what I understood too, but when I'm trying to check is value, it's don't gave me the good result, or may be I didn't do it correctly (see my last block of code) – Paul-Marie May 04 '18 at 17:03
  • 3
    @VolontéDuPeuple The reg field is only three bits, not four. Also, it's not located at the least significant bit but rather at bits 3 to 5. The field at bits 0 to 2 is the r/m field which is not what you need. Refer to the Intel manuals for details. – fuz May 04 '18 at 17:06
  • @EugeneSh.: It's not impossible, it just means you need binary instrumentation tools like https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool. On Broadwell and newer, there's hardware support for tracing branches (Intel PT: https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing) so you can record a trace (with low overhead) and then use it to see all the instructions that executed. – Peter Cordes May 04 '18 at 18:59
  • https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool is exactly what you need – Mark Segal May 05 '18 at 12:17
  • May be, But (like I told in my post) I search how to achieve my goal without any extern tools / library, I know it's possible because school project have the similar problematic (trace CALL and RET ONLY with ptrace() syscall and without the PTRACE_SYSCALL flags). To succeed I need to parse next instruction register (RIP) and find if it's a RET or a CALL. it's possible without any tools, I guarentee it. – Paul-Marie May 05 '18 at 13:25
  • 3
    @VolontéDuPeuple Have you tried fixing your code as I told you in my previous comment? – fuz May 05 '18 at 18:47
  • I agree with fuz. If you need more information, maybe you must read the intel manual: https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf –  May 06 '18 at 14:30
  • @fuz Yes, but your answer wasn't very explicit. I think to see what you told me, but it's still incomplete. First, ptrace syscall return a long, so 8 bytes wich each contain 8 bits. So, your "5 to 3 bits" are a part of the last byte? if ptrace's return value is 0x441f0f0009d12de8 for example, I check the last BYTE (here 0xE8), and must I check bit 3, 4 and 5 of the next bit? (so 0x2D?) if yes, I tried to do like that: if ((address_rip & 0xFF) == 0xFF && ((address_rip & 0b0001110000000000) == 0b010 || (address_rip & 0b0001110000000000) == 0b011))) – Paul-Marie May 06 '18 at 14:31
  • @AlejandroVisiedo Thanks both of you to link me the intel manual, But I allow myself to remind you that I've already read Intel's manual (page 105+ where the part that I focused), and even posted it in my question. If I'm here it's because I already did my best (and still continue) to find where I'm wrong. – Paul-Marie May 06 '18 at 14:36
  • @VolontéDuPeuple I am not familiar with the `ptrace` system call. x86 machine code is laid out as a byte stream with each byte being viewed on its own and each instruction being composed of 1–15 bytes. So the first thing you should do is convert whatever `ptrace` gives you into an array of bytes. The first byte is the opcode, the second byte is the modr/m byte for some opcodes. The reg field is inside the modr/m byte. `(address_rip & 0b0001110000000000) == 0b010` can never hold true and `0b010` isn't valid C syntax so I'm not sure what you are going for. – fuz May 06 '18 at 22:58
  • @fuz : The `0b` for binary constants is a GCC extension. – Michael Petch May 07 '18 at 03:01
  • @fuz like explained in https://stackoverflow.com/questions/31134113/differencing-the-instruction-of-the-same-opcode by `(address_rip & 0b0001110000000000) == 0b010` I'm trying to check bit 5 to 3. To sum up if the primary opcode is 0xFF I check next byte's value and if next byte's 5 to 3 bits is equal to `0b010` or `0b011` so I assume it's a CALL procedure. Am I wrong? – Paul-Marie May 07 '18 at 07:00
  • @VolontéDuPeuple The code would work if you wrote `(address_rip & 0b0001110000000000) == 0b01000000000`. You can't just ignore these trailing bits in the value you compare with. – fuz May 07 '18 at 07:49
  • @MichaelPetch I know. It's still not standard C and not something that works on most compilers. – fuz May 07 '18 at 07:49
  • 1
    @fuz but his question did state he was using GCC. So syntax wouldn't have been his issue, but the syntax he did use was valid with GCC extensions. – Michael Petch May 07 '18 at 08:01
  • @fuz I'd already tried it, if i write `(address_rip & 0b0001110000000000) == 0b01000000000` I get: `call: 675 ret: 651`. So, again not the same value – Paul-Marie May 07 '18 at 09:19
  • @VolontéDuPeuple Oh yeah, count your zeroes. I count ten zeroes to the right, which would indicate a field from bit 2 to bit 4 in the second byte. As my comment and the Intel manuals indicate, the reg field is from bit 3 to bit 5, so you are off by one. Also, note that you can't expect the numbers of calls and returns to match up exactly as there are a few situations where calls without corresponding returns occur. – fuz May 07 '18 at 10:21
  • 1
    If the code you're trying to trace uses exceptions, `longjmp`, or even `exit` then the number of call instructions executed won't equal the number of return instructions executed. if the code you're trace to trace has any defence against being reverse engineered then these instructions won't likely be nicely paired either. – Ross Ridge May 07 '18 at 14:52

2 Answers2

5

Here is an example for how to program this. Note that as an x86 instruction can be up to 16 bytes long, 16 bytes must be peeked to be sure to get a complete instruction. As each peek reads 8 bytes, this means that you need to peek twice, once at regs.rip and once 8 byte later:

peek1 = ptrace(PTRACE_PEEKDATA, pid, regs.rip, NULL);
peek2 = ptrace(PTRACE_PEEKDATA, pid, regs.rip + sizeof(long), NULL);

Note that this code glosses over a lot of details about how prefixes are handled and detects a bunch of invalid instructions as function calls. Note further that the code needs to be changed to also incorporate some more CALL instructions and to remove the detection of REX prefixes if you want to use it for 32 bit code:

int iscall(long peek1, long peek2)
{
        union {
                long longs[2];
                unsigned char bytes[16];
        } data;

        int opcode, reg; 
        size_t offset;

        /* turn peeked longs into bytes */
        data.longs[0] = peek1;
        data.longs[1] = peek2;

        /* ignore relevant prefixes */
        for (offset = 0; offset < sizeof data.bytes &&
            ((data.bytes[offset] & 0xe7) == 0x26 /* cs, ds, ss, es override */
            || (data.bytes[offset] & 0xfc) == 0x64 /* fs, gs, addr32, data16 override */
            || (data.bytes[offset] & 0xf0) == 0x40); /* REX prefix */
            offset++)
                ;

        /* instruction is composed of all prefixes */
        if (offset > 15)
                return (0);

        opcode = data.bytes[offset];


        /* E8: CALL NEAR rel32? */
        if (opcode == 0xe8)
                return (1);

        /* sufficient space for modr/m byte? */
        if (offset > 14)
                return (0);

        reg = data.bytes[offset + 1] & 0070; /* modr/m byte, reg field */

        if (opcode == 0xff) {
                /* FF /2: CALL NEAR r/m64? */
                if (reg == 0020)
                        return (1);

                /* FF /3: CALL FAR r/m32 or r/m64? */
                if (reg == 0030)
                        return (1);
        }

        /* not a CALL instruction */
        return (0);
}
fuz
  • 88,405
  • 25
  • 200
  • 352
  • `rep ret` is common in gcc-generated machine code, so you need to handle that prefix, too. [What does \`rep ret\` mean?](//stackoverflow.com/q/20526361) (Also, if you want to correctly decode lengths of other instructions, you need to also look for `lock` prefixes. At least in 64-bit mode, VEX prefixes don't alias valid instructions...) – Peter Cordes May 07 '18 at 12:19
  • @PeterCordes I only detect `call` instructions, so `rep ret` is irrelevant. I have never seen `rep call` and I hope I won't see that either. VEX and EVEX prefixes don't collide with `call` instructions which is why I left them out. Decoding instruction length correctly is not a goal of this function. – fuz May 07 '18 at 12:20
  • Should I use this function like that?: `long long result = 0; result = ptrace(PTRACE_PEEKDATA, pid, regs.rip, NULL); if (iscall(result & 0xFFFFFFFF00000000, result & 0x00000000FFFFFFFF)) call++;` ? – Paul-Marie May 07 '18 at 12:21
  • Oops, didn't notice you were only detecting `call`, not `ret`. Fair point. – Peter Cordes May 07 '18 at 12:21
  • @VolontéDuPeuple No. Read the answer exactly: you need to peek twice (once at `regs.rip` and once at `regs.rip + sizeof (long)`) and give both values to `iscall`. I have no idea what `result & 0x00000000FFFFFFFF` is trying to achieve. – fuz May 07 '18 at 12:22
  • Oh, and I forgot this was being used to check an instruction given that you already know you're at an instruction boundary, not as a disassembler. So you don't need to worry about finding opcodes buried as immediates for other isns and so on. This looks good for the requirements in the question, +1. – Peter Cordes May 07 '18 at 12:23
  • 2
    `ld` will relax `gcc -fno-plt`'s `call *foo@GOTPCREL(%rip)` to `67 call foo` if `foo` is found in an object file you're linking. See the example at the bottom of [Can't call C standard library function on 64-bit Linux from assembly (yasm) code](https://stackoverflow.com/a/52131094) of `67 e8` as the start of a `call rel32`. (@Paul-Marie). On my Arch GNU/Linux system, that machine code is not rare in disassembly of some real binaries like `objdump -drwC -Mintel /bin/bash | grep '67 e8 '`. But much rarer in other binaries; depends on compile / link options and maybe the source. – Peter Cordes Feb 03 '23 at 18:08
  • @PeterCordes Sorry but it was **5 years ago**, I didn't make a lot of assembly past years so I can only believe you without thinking. – Paul-Marie Feb 03 '23 at 22:07
  • 2
    @Paul-Marie: I just pinged you in case you were still using this exact code for anything and wanted to fix possible false negatives. No worries if you're not still interested in this problem; Mainly was directed at fuz and future readers. – Peter Cordes Feb 04 '23 at 00:21
2

I would personally run the tracing one instruction "late", retaining rip and rsp from the previous step. For simplicity, let's say curr_rip and curr_rsp are the rip and rsp registers obtained from the most recent PTRACE_GETREGS, and prev_rip and prev_rsp from the previous one.

If (curr_rip < prev_rip || curr_rip > prev_rip + 16), then the instruction pointer either went backwards, or forwards by more than the length of the longest valid instruction. If so, then:

  • If (curr_rsp > prev_rsp), the last instruction was a ret of some kind, because data was also popped off the stack.

  • If (curr_rsp < prev_rsp), the last instruction was a call of some kind, because data was also pushed to the stack.

  • If (curr_rsp == prev_rsp), the instruction was some sort of a jump; either unconditional jump, or a branch.

In other words, you only need to inspect the instruction (of curr_rip - prev_rip bytes, which is between 1 and 16, inclusive) starting at prev_rip, when (curr_rsp != prev_rsp && curr_rip > prev_rip && curr_rip <= prev_rip + 16). For this, I'd use Intel XED, but you are free to implement your own call/ret instruction recognizer, of course.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • It's seem fine, But with your technique, how can you diferentiate a CALL/RET of a PUSH/POP? some programs naturally use the stack not to save previous function, but to use it, so how can you differentiate it? I personnaly check symbol's adress of the mnemonic argument to check in wich function it goes; even if it's currently not working, I feel it's "saffer". – Paul-Marie May 08 '18 at 13:04
  • @VolontéDuPeuple: `push` and `pop` instructions are those that need to be inspected, because `curr_rip > prev_rip && curr_rip <= prev_rip + 16` and `curr_rsp != prev_rsp`. Even then, you don't need to detect them per se; you only need to be able to detect `call` (`call near`/`call far`) and `ret` (`ret near`/`ret far`). There is nothing inherently "safer" about relying on the instruction mnemonic alone; my approach just saves two (relatively slow) `ptrace(PTRACE_PEEKTEXT,...)` calls per instruction (except for stack-related operations and very near calls and rets). – Nominal Animal May 08 '18 at 13:36
  • That said, I have not checked how a signal delivery to a signal handler in the tracee would appear. – Nominal Animal May 08 '18 at 13:39
  • That could work but seems like it could generate some false negatives for very short functions. – fuz May 08 '18 at 19:59
  • @fuz: How? The call is seen as a change in `rsp`. If `rip` changes enough, it is detected as-is, otherwise the insn (at `prev_rip`) that caused the change needs to be decoded. The same at ret. I don't see how the function length affects any of this at all. Or is my final paragraph unclear? It should describe the case where inspecting the instruction is necessary; examining `rip` and `rsp` only is not always decisive. – Nominal Animal May 08 '18 at 22:35
  • @NominalAnimal Oh, I thought your intent was to never ever actually decode an instruction. Makes sense this way. – fuz May 08 '18 at 23:00
  • 1
    For what it's worth, I implemented a simple `ret`/`call` detector, and compared the number of calls and rets with and without the register optimization described in my answer. Using `date` from coreutils-8.25-2ubuntu3~16.04 on amd64, the always-inspect-the-instruction method detected 1522 calls and 1409 rets; the method outlined in this answer detected 1533 calls and 1529 rets. I'm not sure if the difference indicates an error in my instruction detector or what.. should've used Intel XED. – Nominal Animal May 09 '18 at 03:56