1

I want to do a statistic of memory bytes access on programs running on Linux (X86_64 architecture). I use perf tool to dump the file like this:

                 :      ffffffff81484700 <load2+0x484700>:
    2.86 :        ffffffff8148473b:       41 8b 57 04             mov    0x4(%r15),%edx
    5.71 :        ffffffff81484800:       65 8b 3c 25 1c b0 00    mov    %gs:0xb01c,%edi
   22.86 :        ffffffff814848a0:       42 8b b4 39 80 00 00    mov    0x80(%rcx,%r15,1),%esi
   25.71 :        ffffffff814848d8:       42 8b b4 39 80 00 00    mov    0x80(%rcx,%r15,1),%esi
    2.86 :        ffffffff81484947:       80 bb b0 00 00 00 00    cmpb   $0x0,0xb0(%rbx)
    2.86 :        ffffffff81484954:       83 bb 88 03 00 00 01    cmpl   $0x1,0x388(%rbx)
    5.71 :        ffffffff81484978:       80 79 40 00             cmpb   $0x0,0x40(%rcx)
    2.86 :        ffffffff8148497e:       48 8b 7c 24 08          mov    0x8(%rsp),%rdi
    5.71 :        ffffffff8148499b:       8b 71 34                mov    0x34(%rcx),%esi
    5.71 :        ffffffff814849a4:       0f af 34 24             imul   (%rsp),%esi

My current method is to analyze file and get all memory access instructions, such as move, cmp, etc. Then calculate every access bytes of every instruction, such as mov 0x4(%r15),%edx will add 4 bytes.

I want to know whether there is possible way to calculate through machine code , such as by analyzing "41 8b 57 04", I can also add 4 bytes. Because I am not familiar with X86_64 machine code, could anyone give any clues? Or is there any better way to do statistics? Thanks in advance!

Nan Xiao
  • 16,671
  • 18
  • 103
  • 164
  • 1
    Sounds like a job for Valgrind. – Seva Alekseyev Feb 02 '15 at 03:25
  • @SevaAlekseyev:Could you give detailed information? Thanks! – Nan Xiao Feb 02 '15 at 03:32
  • Your requirements are unclear. You want to know what the distribution of byte values fetched (you know the machine often fetches values of other sizes)? The distrubution of addresses used by a running program? ... of addresses found in the object code, ignoring execution? FWIW, one can use "machine code" to tear apart individual machine instructions, just like you can use C to do the same thing (C code in fact is compiled to machine code, and thus a C program to do this implicitly provide a machine program to do this). Provide more explanation and some more examples. – Ira Baxter Feb 02 '15 at 05:12
  • @IraBaxter: OK, thanks for your comments! I will detail it: I want to measure average memory access bytes per instruction. Because when program executes, there are many branches and uncertainties, I use `perf` to do a statistic. Still use the above example, the average memory access bytes per instruction should be: `2.86% * 4 + ...... + 5.71% * 8`. Is it more clear? – Nan Xiao Feb 02 '15 at 05:27
  • Still unclear. You want to know: how many data bytes are fetched by each instruction, on average? Or, how many bytes make up each instruction (that access memory)? You want this computed from runtime data, or only from static analysis of the object code? [What do you intend to do with this data?] – Ira Baxter Feb 02 '15 at 05:38
  • @IraBaxter: OK. I am not a English native speaker, so please excuse me. I will introduce the background: I want to calculate the total memory access bytes when executing a program: 1) I use `perf` to generate a file which contains all memory access assembly code and percentage ratio; 2) multiple the percentage ratio and memory access bytes of every instruction, and get sum will generate the average memory access bytes of every instruction. 3) calculate the total memory access bytes. Now, my concern is the second step. How can I know the memory access bytes through assembly/machine code? – Nan Xiao Feb 02 '15 at 06:03
  • Now clear. See my answer. – Ira Baxter Feb 02 '15 at 08:15
  • 1
    Note that it should be possible to use the CPU's performance monitoring counters to measure the actual number of memory accesses (instead of using a convoluted estimation) and this would be more accurate and more specific (e.g. you'd be able to split it into categories - instruction fetch, cache misses, speculative fetches, etc). – Brendan Feb 02 '15 at 08:57
  • @Brendan: I have checked `MEM_UOP_RETIRED.ALL_LOADS` and `MEM_UOP_RETIRED.ALL_STORES` hardware events, but the count register may overflow. – Nan Xiao Feb 02 '15 at 09:29

1 Answers1

1

See https://stackoverflow.com/a/20319753/120163 for information about decoding Intel instructions; in fact, you really need to refer to Intel reference manuals: http://download.intel.com/design/intarch/manuals/24319101.pdf If you only want to do this manually for a few instructions, you can just look up the data in these manuals.

If you want to automate the computation of instruction total-memory-access, you will need a function that maps instructions to the amount of data accessed. Since the instruction set is complex, the corresponding function will be complex and take you a long time to write from scratch.

My SO answer https://stackoverflow.com/a/23843450/120163 provides C code that maps x86-32 instructions to their length, given a buffer that contains a block of binary code. Such code is necessary if one is to start at some point in the object code buffer and simply enumerate the instructions that are being used. (This code has been used in production; it is pretty solid). This routine was built basically by reading the Intel reference manual very carefully. For OP, this would have to be extended to x86-64, which shouldn't be very hard, mostly you have account for the extended-register prefix opcode byte and some differences from x86-32.

To solve OP's problem, one would also modify this routine to also return the number of byte reads by each individual instruction. This latter data also has to be extracted by careful inspection from the Intel reference manuals.

OP also has to worry about where he gets the object code from; if he doesn't run this routine in the address space of the object code itself, he will need to somehow get this object code from the .exe file. For that, he needs to build or run the equivalent of the Windows loader, and I'll bet that has a bunch of dark corners. Check out the format of object code files.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341