3

I want to count the total number of instructions executed when running /bin/ls. I used 3 methods whose results differ heavily and i dont have a clue why.

1. Instruction counting with ptrace

I wrote a piece of code that invokes an instance of ls and singlesteps through it with ptrace:

#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>    
#include <sys/syscall.h>

int main()
{   
    pid_t child;
    child = fork(); //create child
    
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        char* child_argv[] = {"/bin/ls", NULL};
        execv("/bin/ls", child_argv);
    }
    else {
        int status;
        long long ins_count = 0;
        while(1)
        {
            //stop tracing if child terminated successfully
            wait(&status);
            if(WIFEXITED(status))
                break;

                ins_count++;
                ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
        }

    printf("\n%lld Instructions executed.\n", ins_count);

    }
    
    return 0;
}

Running this code gives me 516.678 Instructions executed.

2. QEMU singlestepping

I simulated ls using qemu in singlestep mode and logged all incoming instructions into a log file using the following command: qemu-x86_64 -singlestep -D logfile -d in_asm /bin/ls

According to qemu ls executes 16.836 instructions.

3. perf

sudo perf stat ls

This command gave me 8.162.180 instructions executed.

I know that most of these instructions come from the dynamic linker and it is fine that they get counted. But why do these numbers differ so much? Shouldn't they all be the same?

Sbardila
  • 113
  • 6
  • 3
    Your own program will count the instructions of `ld-linux.so` but not instructions of the system calls (`syscall` is counted as one single instruction). I guess (I don't know) that `perf` will count the instructions in the Linux kernel (so one `syscall` is thousands of instructions) and "qemu" does not count `ld-linux.so`. – Martin Rosenau Nov 13 '20 at 20:06
  • thank you! so if my code skips the instructions of "ld-linux.so" i should get the same count as qemu? – Sbardila Nov 14 '20 at 10:01

3 Answers3

2

Your counting instruction number method with qemu was wrong,the in_asm option only show the translated instruction in a compiled block, so after tb chaining process in qemu, it would dirctly jump to the translated block,leading the count in qemu was less than other tools, so a good way in practice is -d nochain,exec with -singlestep options.

Still, there also have instruction number differce between these tools, i have tried qemu running in different dirctory to produce those logs, the qemu guest program was statically linked, the logs file show different results in counting instruction number, it may be some glibc start or init stuff get involved with environment arguments to cause this differnce.

wen liang
  • 21
  • 4
1

Why do these instruction counts differ so much? Because they really measure different things, and only the unit of measure is the same. It's as if you were weighing something you brought from the store, and one person weighed everything without packages nor even stickers on it, another was weighing it in packages and included the shopping bags too, and yet another also added the mud you brought into the house on your boots.

That's pretty much what is happening here: the instruction counts are not the instruction counts only of what's inside the ls binary, but can also include the libraries it uses, the services of the kernel loader needed to bring those libraries in, and finally the code executed in the process but in the kernel context. The methods you used all behave differently in that respect. So the question is: what do you need out of that measurement? If you need the "total effort", then certainly the largest number is what you want: this will include some of the overhead caused by the kernel. If you need the "I just want to know what happened in ls", then the smallest number is the one you want.

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • My goal is to interate through the same instructions as qemu does. Qemu can dump registers while iterating through the instructions of a program, i want to do the same with my ptrace code, iterate through the same instructions as qemu and dump the registers for comparision, Unfortunately there is not much documentation for qemu and i am not experienced enough to understand it by looking at the qemu source. – Sbardila Nov 14 '20 at 10:01
  • Then you have to use qemu. Period. You won't be able to do it with ptrace - why do you think you can? It's not possible. qemu can emulate all the way down to the level of a virtual machine, i.e. it emulates an entire PC, and will emulate the kernel, so it has the "insight" into all levels of execution. It can be used to only do partial emulation, e.g. to run ARM linux code on an Intel CPU, but it has to be set up that way, and you're mum on exact setup. `ptrace` does not trace into the kernel code, so that's that. – Kuba hasn't forgotten Monica Nov 16 '20 at 19:21
  • 2
    The specific answer in the case of the QEMU part of this is "the 'in_asm' log is not a count of total executed instructions at all !" -- see the longer explanation in my answer to https://stackoverflow.com/questions/64847254/what-instructions-does-qemu-trace – Peter Maydell Nov 17 '20 at 13:34
1

Your program using PTRACE_SINGLESTEP should count all user-space instructions executed in the process. A syscall instruction counts as one because you can't single-step into the kernel; that's opaque to ptrace.

That should be pretty similar to perf stat --all-user or perf stat -e instructions:u to count user-space instructions. (Probably counting the same within a few instructions out of however many millions). That perf option or :u event modifier tell it to program the HW performance counters to only count the event while the CPU is not at privilege level 0 (kernel mode); modern x86 CPUs have hardware support for this so perf doesn't have to run instructions inside the kernel on every transition to stop and restart counters.

Both of these include everything that happens in user-space, including ld-linux.so dynamic linker code that runs before execution reaches _start in a dynamic executable.

See also How do I determine the number of x86 machine instructions executed in a C program? which includes hand-written asm source for a static executable that only runs 2 instructions in user-space. perf stat --all-user counts 3 instructions for it on my Skylake. That Q&A also has a bunch of other discussion about what happens in a user-space process, and hopefully useful links.


Qemu counting is totally different because it does dynamic translation. See wen liang's answer and What instructions does qemu trace? which Peter Maydell linked in a comment on Kuba's answer here.

If you want to use a tool like this, you might want Intel's SDE, which uses Intel PIN dynamic instrumentation. It can histogram instruction types for you, as well as counting a total. See my answer on How do I determine the number of x86 machine instructions executed in a C program? for links.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847