5

I am trying to add custom instructions in X86_64 (amd64) ISA. I will create a binary using these instructions and then I'll run it in gem5 simulator. To find the list of unused opcodes, I am referring to http://ref.x86asm.net/coder64.html. However, the website only lists the single byte unused opcodes. What I am looking for is the decoder grammar of amd64 CPUs so that I can find what sequences of binary would be invalid. I tried to test out some random binary sequences to see if they are invalid and I was able to find some (for example 0xc5000c is an invalid 3 byte instruction). However, I would like to have the grammar which amd64 CPUs use to decode the instructions to derive even longer invalid binary sequences. Are there any resources available for this task?

Currently, to find binary sequences, I am using a C program which looks like:

int main()
{
        asm volatile(".byte 0x06" ::: "memory");
        asm volatile(".byte 0x07" ::: "memory");
        asm volatile(".byte 0x0E" ::: "memory");
        asm volatile(".byte 0x16" ::: "memory");
        asm volatile(".byte 0x17" ::: "memory");
        asm volatile(".byte 0x1E" ::: "memory");
        asm volatile(".byte 0x1F" ::: "memory");
        asm volatile(".byte 0x27" ::: "memory");
        asm volatile(".byte 0x2F" ::: "memory");
        asm volatile(".byte 0x37" ::: "memory");
        asm volatile(".byte 0x3F" ::: "memory");
        asm volatile(".byte 0x60" ::: "memory");
        asm volatile(".byte 0x61" ::: "memory");
        asm volatile(".byte 0x62" ::: "memory");
        asm volatile(".byte 0x82" ::: "memory");
        asm volatile(".byte 0x9A" ::: "memory");
        asm volatile(".byte 0xC4" ::: "memory");
        asm volatile(".byte 0xC5" ::: "memory");
        asm volatile(".byte 0xD4" ::: "memory");
        asm volatile(".byte 0xD5" ::: "memory");
        asm volatile(".byte 0xD6" ::: "memory");
        asm volatile(".byte 0xEA" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x00" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x01" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x02" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x03" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x04" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x05" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x06" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x07" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x08" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x09" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x0A" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x0B" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x0C" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x0D" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x0E" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x0F" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x10" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x11" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x13" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x17" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x18" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x19" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x1A" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x1B" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x1C" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x1D" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x1E" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x1F" ::: "memory");
        asm volatile(".byte 0xC5, 0x00, 0x01" ::: "memory");

        return 0;
} 

To see if the instruction is invalid, I compile the program to an executable and then dump its binary with objdump using the following command: gcc main.c && objdump -D -j .text a.out. When I run this with the C program I get:

...
    10ed:   c6 05 1c 2f 00 00 01    movb   $0x1,0x2f1c(%rip)        # 4010 <__TMC_END__>
    10f4:   5d                      pop    %rbp
    10f5:   c3                      ret
    10f6:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    10fd:   00 00 00 
    1100:   c3                      ret
    1101:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
    1108:   00 00 00 00 
    110c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000001110 <frame_dummy>:
    1110:   f3 0f 1e fa             endbr64
    1114:   e9 67 ff ff ff          jmp    1080 <register_tm_clones>

0000000000001119 <main>:
    1119:   55                      push   %rbp
    111a:   48 89 e5                mov    %rsp,%rbp
    111d:   06                      (bad)
    111e:   07                      (bad)
    111f:   0e                      (bad)
    1120:   16                      (bad)
    1121:   17                      (bad)
    1122:   1e                      (bad)
    1123:   1f                      (bad)
    1124:   27                      (bad)
    1125:   2f                      (bad)
    1126:   37                      (bad)
    1127:   3f                      (bad)
    1128:   60                      (bad)
    1129:   61                      (bad)
    112a:   62 82                   (bad)
    112c:   9a                      (bad)
    112d:   c4                      (bad)
    112e:   c5 d4 d5                (bad)
    1131:   d6                      (bad)
    1132:   ea                      (bad)
    1133:   c5 00 00                (bad)
    1136:   c5 00 01                (bad)
    1139:   c5 00 02                (bad)
    113c:   c5 00 03                (bad)
    113f:   c5 00 04                (bad)
    1142:   c5 00 05                (bad)
    1145:   c5 00 06                (bad)
    1148:   c5 00 07                (bad)
    114b:   c5 00 08                (bad)
    114e:   c5 00 09                (bad)
    1151:   c5 00 0a                (bad)
    1154:   c5 00 0b                (bad)
    1157:   c5 00 0c                (bad)
    115a:   c5 00 0d                (bad)
    115d:   c5 00 0e                (bad)
    1160:   c5 00 0f                (bad)
    1163:   c5 00 10                (bad)
    1166:   c5 00 11                (bad)
    1169:   c5 00 13                (bad)
    116c:   c5 00 17                (bad)
    116f:   c5 00 18                (bad)
    1172:   c5 00 19                (bad)
    1175:   c5 00 1a                (bad)
    1178:   c5 00 1b                (bad)
    117b:   c5 00 1c                (bad)
    117e:   c5 00 1d                (bad)
    1181:   c5 00 1e                (bad)
    1184:   c5 00 1f                (bad)
    1187:   c5 00 01                (bad)
    118a:   b8 00 00 00 00          mov    $0x0,%eax
    118f:   5d                      pop    %rbp
    1190:   c3                      ret

What I am looking for is a faster way to find such invalid binary sequences which would preferably not involve the brute force approach that I am using.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Setu
  • 180
  • 8
  • Since this is the "gold" and therefore the intellectual property of each processor and its manufacturer, I doubt that you will get answers on "_the decoder grammar of amd64 CPUs_". However, there might be a list of unused opcodes that are officially recognized as such. – the busybee Nov 01 '22 at 15:28
  • 1
    Have you seen [this BlackHat paper](https://www.blackhat.com/docs/us-17/thursday/us-17-Domas-Breaking-The-x86-Instruction-Set-wp.pdf) ? Interesting stuff. E.g. `66 e9`, JMP with 16 bits operand size override doesn't work on Intel x64, but does on AMD. – MSalters Nov 01 '22 at 16:03
  • 1
    Most of the coding-space is used up in 32-bit mode, except stuff like invalid encodings of instructions that e.g. require a memory operand in the ModRM byte. AMD64 removed many 1-byte opcodes, freeing them up for other use in 64-bit mode only. See also [which MOV instructions in the x86 are not used or the least used, and can be used for a custom MOV extension](https://stackoverflow.com/q/60745735) re: finding free opcodes to use with GEM5. – Peter Cordes Nov 01 '22 at 22:38

2 Answers2

3

There are two "official undefined" instructions, UD1 (0F B9) and UD2 (0F 0B). These exist for your specific purpose.

You seem to assume that every x86-64 CPU decodes instructions in the same way. This is certainly not the case. We know Intel and AMD have differences. You can't even assume Intel or AMD is consistent with itself. And with Alder Lake, you can't even assume that a CPU is consistent with itself. We know the P-cores of Alder Lake have AVX-512 logic, but the E-cores don't. Presumably only the P-core decoders understand the EVEX encodings.

What you're doing now with objdump just tells how objdump interprets opcodes, which may not match what AMD and/or Intel are doing.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 1
    When booted with any E-cores active, Alder Lake CPUs disable AVX-512 support on the P-cores. This might happen internally, in the "microcode", not just a matter of software (e.g. the BIOS) choosing to disable it or not. Unfortunately Intel decided the world isn't ready for heterogeneous instruction-set support, and made it impossible to get into that state AFAIK. (And later took steps to fuse off AVX-512 in hardware in later Alder Lake steppings, so it can't even be enabled by booting with the E-cores disabled, even with a firmware that loads old microcode. Glad I didn't buy one.) – Peter Cordes Nov 01 '22 at 22:36
  • 2
    Also note that this is about finding free opcodes to use for experiments in GEM5, so you know what will be decoding your machine code: the GEM5 software. The fact that some future CPU might decode those opcodes to something else in 64-bit mode isn't a problem for that use-case. – Peter Cordes Nov 01 '22 at 22:39
  • 3
    But `UD1` and `UD2` *do* have defined behavior: they generate the `#UD` exception. And there is code that relies on this, as a way to ensure that a program aborts instead of continuing execution. [Compilers can insert it when code is guaranteed to cause language-undefined behavior](https://godbolt.org/z/x5r43vcoz). So if you repurpose those opcodes to do something other than `#UD`, then previously well-defined machine code will change its behavior, and that may not be good. – Nate Eldredge Nov 02 '22 at 02:12
0

As suggested by the comments, I don't think it is possible to get the grammar of the amd64 CPUs. So I think I'll just look at the grammar which gem5 used to extract out the required binary sequences.

Thanks

Setu
  • 180
  • 8