Is emitting x86 instructions depended on Endianness?

Question

Assume i'm trying to emit IMUL into memory using this function:

void emit(unsigned char byte)
{
    instruction_start[offset++] = byte;
}

IMUL has this format: 0F AF /r. If I emit 1 byte at a time, should I consider the endianness and reverse the bytes? meaning should I emit /r first then OF AF?

I am currently not reversing the bytes and emitting 1 byte as is and it's working, but not sure how.

Edit: It seems like instructions are treated as Strings, they area read 1 byte at a time and maintain their documented order, unless there are immediate values within them: https://stackoverflow.com/a/60905404

Intel's docs describe x86 instructions in terms of a byte stream (in memory order), except for wider immediates and displacements. You could have tested this yourself by looking at the output of any assembler. — Peter Cordes, Apr 29 '20 at 13:45
Took me a minute to find it, but I recently explained it in more detail here: [How to interpret objdump disassembly output columns?](https://stackoverflow.com/a/60905404). Single bytes don't have endianness, only multi-byte values like an `imm32` for `69 c1 55 44 00 00 imul eax,ecx,0x4455` — Peter Cordes, Apr 29 '20 at 13:55
Consider endianess in the Intel world means _consider little endian_. — Lundin, Apr 29 '20 at 13:55
You emit them in the order they are documented. `0F AF /r`. Consider opcodes rather like a kind of "strings". — Jabberwocky, Apr 29 '20 at 13:57
@Lundin: Yes, any multi-byte things like immediates or displacements are little-endian, but the OP's question is whether they should reverse the separate bytes of an mandatory prefixes + opcode relative to Intel's documentation of the byte sequence, like https://www.felixcloutier.com/x86/imul — Peter Cordes, Apr 29 '20 at 13:58
The questions are approaching the multi-byte-opcode endianness question from different angles (assembling vs. disassembling), but really are duplicates. The bolded parts of my answer there exactly answers this, so IMO it's fully a duplicate. Just not one that would come up in a search so this is a possibly useful signpost. — Peter Cordes, Apr 29 '20 at 14:02
Thanks for you suggestions and links. So basically, opcodes in memory are treated as Strings, unless there are immediate within that string, in which case the immediate will have reverse bytes on x86, not the whole instruction. — , Apr 29 '20 at 14:02
@Josh The bytes are not “reversed,” they are in “little endian” byte order. It's big endian machines that have their bytes the wrong way round :-P. Correct code never needs to “reverse bytes.” Instead, explicitly compute each byte of the little endian representation of your number and emit them one after another. — fuz, Apr 29 '20 at 14:03
@PeterCordes Right... my preferred byte order for a single byte is: single byte first :) — Lundin, Apr 29 '20 at 14:04
@PeterCordes One side question, on multi length instruction systems, how does the CPU know when to stop reading a byte stream as the instruction boundary? based on the opcode itself? — , Apr 29 '20 at 14:10
@Josh yes [How does the CPU/assembler know the size of the next instruction?](https://stackoverflow.com/q/25101978/995714) — phuclv, Apr 29 '20 at 14:12
Ross explains it well on [How does the CPU know how many bytes it should read for the next instruction, considering instructions have different lenghts?](https://stackoverflow.com/q/56385995). In practice hardware brute-forces parallel length-finding of up to 5 instructions per clock cycle (Skylake) from a buffer of up to 16 bytes to maintain the illusion of having parsed instructions 1 byte at a time and executed them sequentially before decoding the next. https://www.realworldtech.com/sandy-bridge/3/ and https://agner.org/optimize/ go into some details about the front-end in real CPUs. — Peter Cordes, Apr 29 '20 at 14:12
@Josh It's a bit complicated, but basically it works that way. The opcode tells what operands appear and if present, a modr/m byte tells whether a sib byte and displacement bytes are present. Prefixes must be accounted for, too. — fuz, Apr 29 '20 at 14:12
@PeterCordes Since you said instructions are read 1 byte at a time, if I emit 4 or 8 byte instructions at once we will have a problems right? since now my instruction is stored in LE byte order and will be backwards when CPU want to read it? — , Apr 29 '20 at 14:18
@Josh: It doesn't matter *how* you get the bytes into memory as long as they're all there in the right order (the order a known working assembler like NASM would use, or any of the existing assembler libraries). That's something you have to get right in the C program you're writing ([How to write endian agnostic C/C++ code?](https://stackoverflow.com/q/13994674)) but is basically unrelated to the fact that those bytes are machine instructions for some CPU. Any data stream (like an h.264 compressed video, PCM audio, or an IPv6 header) has to be emitted with the bytes in the right order. — Peter Cordes, Apr 29 '20 at 14:20
@PeterCordes but if I write `0F AF /r` into an array index at once then surely the bytes are reversed in memory, that's not C, thats just how things are stored in LE machines. — , Apr 29 '20 at 14:23
`0F AF /r` isn't valid C syntax. If you do it wrong in C then the bytes will be in memory in the wrong order. Don't write buggy programs, obviously. (And ideally write a program that will produce valid machine code regardless of running a big or little endian *C* implementation). e.g. use `memcpy` from a 2-byte array for the opcode. Of course you can't write it as `unsigned short imul_opcode = 0x0faf` and `memcpy` from that, if that's what you mean. — Peter Cordes, Apr 29 '20 at 14:26
@PeterCordes that exactly what I meant, so it has to be done in chunks to avoid the endianness issue. — , Apr 29 '20 at 14:30
But you can do a 4-byte memcpy or otherwise help the compiler do a 4-byte store into a buffer. Most of the ways other than memcpy aren't safe or recommended, though. if you compiled with `gcc -fno-strict-aliasing`, you can use `htole32` to convert bytes from host to little endiannes and `*(uint32_t)&buf[pos] = htole32(immediate)` or whatever to store the immediate for an `imul r,r, imm32`. Or you could use that for prefix + opcode bytes if you have them in the right order inside an integer. But yeah, for the prefix/opcode / modrm/sib bytes it's much easier to do single bytes in the src. — Peter Cordes, Apr 29 '20 at 14:35
Actually even `gcc -fno-strict-aliasing` doesn't make misaligned pointers safe, so that was a poor example. https://trust-in-soft.com/blog/2020/04/06/gcc-always-assumes-aligned-pointers/. By far the easiest and most easily portable thing is `memcpy` from arrays of `uint8_t` or single byte assignments. — Peter Cordes, Apr 29 '20 at 14:37

Is emitting x86 instructions depended on Endianness?

0 Answers0

Linked