This is going to vary by operating system, compiler, compiler version, programming language, and possibly architecture extensions of the target you're looking at. It further depends on where you draw the line between what is or is not "the same" instruction.
On arm64, if you consider movz w0, 0
and movz w0, 1
to be different, then you can just take any binary, chunk it up into 4-byte blocks, print in hexadecimal and run that through sort | uniq -c
.
If you consider all movz
to be the same instruction, then the question arises whether movz w0
and movz x0
are the same. And if that is a yes, then there are other instructions like add
that have different forms where the third operand is either a register or an immediate. And there's things like ldr
with pre- and post-indexing modes that introduce additiona effects. And on top of all of that, there's pseudo-instructions like mov
that can be encoded as either movz
, movn
or orr
, depending on what the operands are.
But let's look at a concrete example: the /bin/bash
binary on macOS 12.6.1.
The __TEXT.__text
section of its arm64e slice has 118587 instructions. If we disassemble the entirety of it and try to normalise it by replacing all immediates with ...
, all general-purpose registers as well as (w|x)zr
and (w)sp
with rN
and all floating-point registers with fN
with this command:
perl -pe 's/b\.\w{2}/b.cond/g;s/\b(x[0-9]+|sp|xzr)\b/rN/g;s/\bw([0-9]+|sp|zr)/rN/g;s/\b[dq][0-9]+/fN/g;s/-?0x[0-9a-f]+/.../g;s/, -?\d+/, .../g;s/(lsl|lsr|asr|uxtw|sxtw) (\d+|0x[0-9a-f]+)/.../g;s/, \.\.\.\]/]/g'
And then sort the result by frequency with this:
sort | uniq -c | perl -pe 's/^\s+//g' | sort -V
And then choose an arbitrary cutoff at the ret
instruction, we get:
265 ret
268 brk ...
269 autibsp
271 eor rN, rN, rN, ...
284 sub rN, rN, rN
299 cset rN, eq
312 cmn rN, ...
322 ldur rN, [rN]
332 add rN, rN, rN, ...
363 csel rN, rN, rN, eq
392 orr rN, rN, ...
432 paciza rN
450 ldr rN, [rN, rN]
476 strb rN, [rN]
507 ldp fN, fN, [rN]
578 sxtw rN, rN
745 and rN, rN, ...
780 tbnz rN, ..., ...
805 tbz rN, ..., ...
866 stp rN, rN, [rN]!
871 add rN, rN, rN
978 ldp rN, rN, [rN], ...
1028 invalid
1141 stp fN, fN, [rN]
1170 retab
1290 pacibsp
1457 sub rN, rN, ...
1520 cbnz rN, ...
1567 cmp rN, rN
1622 ldrb rN, [rN]
1695 adrp rN, ...
2548 ldr rN, ...
3504 stp rN, rN, [rN]
3537 ldp rN, rN, [rN]
3996 adr rN, ...
4644 cmp rN, ...
4664 add rN, rN, ...
4728 cbz rN, ...
4966 b ...
5294 str rN, [rN]
5601 b.cond ...
7451 mov rN, ...
7691 nop
8270 ldr rN, [rN]
10181 bl ...
11585 mov rN, rN
This amounts to 112015 instructions, or 94.5% of the total. A few things stand out here:
- This is an "arm64e" binary, which is Apple's name for their ABI with pointer authentication. On a regular arm64 target, all
retab
would be ret
, and pacibsp
, autibsp
and paciza
would not be present at all.
- Almost all occurrences of
brk
are following an autibsp
, which is a compiler change Apple was forced to make in order to address a structural weakness of the FEAT_PAuth
architecture extension. The remaining occurrences are very likely invocations of __builtin_trap()
in the source.
nop
is very common. This is due to PC-relative addressing, where the compiler emits two or three instructions like adrp x0, ...; add x0, x0, ...
, which can then be optimised at link time to adr x0, ...; nop
if the target address is close enough to the instruction. It is debatable whether nops "run" though, or whether they are just deleted from the instruction pipeline.
invalid
is very common. These are jump tables found after the functions that use them. They are data, not code, and thus never actually run.
- Floating-point registers only show up in two instructions:
stp
and ldp
. The compiler commonly uses floating-point registers for memcpy and memset operations, for one because these registers can hold up to 128 bits, and for another because it avoids having to spill general-purpose registers, which are a lot more likely to have already been allocated than floating-point registers, at least in command-line binaries like bash
. I would expect similar conditions in operating system kernels and embedded firmwares, but in graphics-related code or machine learning, this may look very different.
- The vast majority of
cmn
instructions check against the value -1
. This is exceedingly likely a result of API design in C, and not inherent to the architecture.
And now it depends whether you wanna reduce some of these instructions or not. But by and large, I think the pattern is clear:
- Register moves
- Immediate construction
- Function calls
- Unconditional branches
- Conditional branches
- Register compares
- Memory loads
- Memory stores
make up for the vast majority of instructions, and it makes sense, because with these building blocks, you can do essentially anything. Almost all other instructions are some kind of optimisation on those, which are more niche and thus occur less often. The remaining instructions are architecture-specific and expose things you couldn't otherwise get, like brk
or msr
, but these are going to be even rarer. And this is likely to be the case no matter what architecture you look at.