Which processor instructions are used most commonly?

Question

My objective is to do comparative study of a few instruction set architectures.
For each instruction set architecture, how can i find the most commonly used instructions?

This is the steps i am thinking of:

Find common ISAs for a chosen domain
Find popular programs for each such ISA
Disassemble the program instructions (.code) (which tool?)
Collect statistics on instruction format, opcode, type. (which tool?)

Here is a very good study on x86 machine code statistics: https://www.strchr.com/x86_machine_code_statistics

I have tried below command for disassembling, but it does not seem to disassemble properly. Disassembled code shows some das instructions, which should not be present in actual code.

ndisasm -b32 -a $(which which)

A disassembler does not know whether something is data or code. They may even be the same – data can be treated as code, code as data (the von Neumann architecture as implemented in all current CPUs). That is why you cannot point a generic disassembler to a random piece of an executable and say, "disassemble this!" You *can* but, as you found, it *will* disassemble whatever you are pointing it to. — Jongware, Feb 20 '20 at 18:30
Ah, that makes sense, but thankfully `objdump` recognizes ELF file. `objdump --disassemble $(which ls) > ls.log` seems to do the right disassembly. — wolfram77, Feb 20 '20 at 18:48
You cannot be 100% sure about that; there may still be static data – and even unused code! – inside executable sections. — Jongware, Feb 20 '20 at 18:52
Then i guess, an appropriate way to know the instructions being used would be to run it on a debugger, and somehow let it print out the executed instructions to a file. What do you think? — wolfram77, Feb 20 '20 at 19:02
@wolfram77: I think there's a major difference between "generated by a compiler most often" and "executed by a CPU most often"; and you'll need to figure out which is better for your purposes. — Brendan, Feb 20 '20 at 21:03
@usr2564301: Plain non-obfuscated compiler-generated x86 executables do disassemble easily. x86 compilers don't mix code and data; unlike ARM there's no benefit to literal pools near code (between functions) so compilers don't do it. Of course you have to use a disassembler like `objdump` or `objconv` that knows about ELF metadata, which `ndisasm` does not! ndisasm treats everything as a flat binary, including metadata and .data and .rodata — Peter Cordes, Feb 21 '20 at 00:55

score 4 · Accepted Answer · answered Feb 20 '20 at 18:43

4

You can try this, to gather mnemonics from .text section:

objdump --no-show-raw-insn \
        -M intel           \
        -sDj .text $(which *program name*) | # <-- disassemble .text section
             sed -n '/<\.text>/, $ p'      | # <-- skip raw hex
             awk '{$1 = ""; print}'        | # <-- remove offsets
             sed '1d'                        # <-- delete annoying <.text> in first line

After that you can either get only mnemonics name, appending awk '{print $1}' to previous command, or mutating data somehow different.

After all of this add sort | uniq -c to previous steps. So my resulting command looked like:

objdump --no-show-raw-insn \
        -M intel           \
        -sDj .text $(which *program name*) | 
             sed -n '/<\.text>/, $ p'      | 
             awk '{$1 = ""; print}'        |
             sed '1d'                      |
             awk '{print $1}' | sort | uniq -c

Which prints out frequencies of every mnemonic from program's text section

answered Feb 20 '20 at 18:43

nonForgivingJesus

595
4
15

Thanks for posting a nice solution. I removed the first `sed` otherwise i dont get anything. Also content of `.text` is not necessary so `-s` is not needed too. Now i do get a listing of instructions with their counts (along with some extras). – wolfram77 Feb 20 '20 at 19:06
1

@wolfram77: Note that this answer gives you the *static* instruction count. It counts every instruction once, whether it's inside a tight loop or whether it's in error-handling code that never executes at all in normal operation. More often you want the *dynamic* instruction count, e.g. with `sde64 --mix` [How to characterize a workload by obtaining the instruction type breakdown?](//stackoverflow.com/q/58243626) and [How do I determine the number of x86 machine instructions executed in a C program?](//stackoverflow.com/q/54355631) – Peter Cordes Feb 21 '20 at 01:00
That looks very promising. The `sde-mix-out.txt` on `ls` is listing a bunch of opcode types, and their counts, and there are several of those blocks. While doing that on `ls` of a directory, it seems all these counts match. When i do a diff, the only changes i see are most likely because of thread id and memory addresses changing. Thanks, i will look into it further, in order to understand the output. – wolfram77 Feb 21 '20 at 10:28
Maybe it would only be possible to do static instruction usage statistics for another architecture, like Atmel AVR, or maybe run a few example programs on a simulator. – wolfram77 Feb 21 '20 at 11:01

Which processor instructions are used most commonly?

1 Answers1