1

I have a program where I'm trying to trace (instruction trace) with Intel's Pin program (v3.25). The commands I'm running:

pin -t obj-intel64/inscount0.so -- <my binary>
perf stat --event=instructions:{k,u} -- <my binary>

However, the reported number of instructions is wildly different (10x difference) with perf report much much more.

When I used the itrace.so program and the replayed the instructions, I also noticed way fewer instructions than I would've expected for the program. So it got me thinking that there might be something wrong with the pin setup. But I'm not sure where to go from here to debug. Any advice?

jkang
  • 483
  • 7
  • 19
  • 1
    Did you tell `perf` to count all instructions, including kernel instructions? Pin can of course only instrument user-space, so to match that you should use `perf stat --all-user` to apply `:u` to all the events. Or `perf stat -e instructions:u ./a.out` . I guess you're separately counting user and kernel? For examples, see [How do I determine the number of x86 machine instructions executed in a C program?](https://stackoverflow.com/q/54355631) / [Benchmarking - How to count number of instructions sent to CPU to find consumed MIPS](https://stackoverflow.com/q/50019857) – Peter Cordes Nov 04 '22 at 08:07
  • SDE has instruction-counting using PIN, see [How to characterize a workload by obtaining the instruction type breakdown?](https://stackoverflow.com/q/58243626) for an example of that. – Peter Cordes Nov 04 '22 at 08:07
  • Thanks. Yes I'm separating user vs kernel. The problem is that even userspace inscount is off by something like 10x. I manually wrote a small loop program and the two counts match. So I'm guessing it has to do with my app (multithreaded)? – jkang Nov 04 '22 at 17:56
  • Ok, your question didn't include that detail. Should be easy to test with a single-threaded program like `awk 'BEGIN{for(i=0;i<10000000;i++){}}'` to see if Pin matches `perf` user-space counts for that. IIRC, SDE does. – Peter Cordes Nov 04 '22 at 18:47
  • 1
    Ya sorry it just occurred to me that that may be the difference. I did do a simple loop program that's just a bunch of multiplies and perf matches PIN. Does PIN support instrumenting multiple threads? – jkang Nov 04 '22 at 20:07
  • IDK, but I think in general it does. But maybe `obj-intel64/inscount0.so` doesn't. – Peter Cordes Nov 04 '22 at 20:11
  • Gotcha thanks. I do notice that obj-intel64/inscount0.so uses a global variable to count the instructions. Do you have any details as to how PIN instruments instructions? Is a global variable in the instrumentation code shared across all threads? – jkang Nov 04 '22 at 20:30
  • 1
    I haven't done much with Pin, I don't know its internals. I do know Intel's SDE is based on it, and I expect it's more robust in terms of multithreading. https://www.intel.com/content/www/us/en/developer/articles/tool/software-development-emulator.html – Peter Cordes Nov 04 '22 at 20:32

0 Answers0