0

I'm attempting to write a small x86-64 JIT, and I'm a little over my head in a few places.

I'm trying to JIT a simple function that assigns the value of a float into the xmm0 register and then returns it, but I am unsure of how I should go about encoding the arguments to the movsd call.

Any help would be greatly appreciated.

/* main.c */
#include <stdio.h>
#include <sys/mman.h>

#define xmm(n) (n)

typedef double(*fn)();

fn jit(){
    char* memory = mmap(NULL,
                        4096,
                        PROT_READ|PROT_WRITE|PROT_EXEC,
                        MAP_PRIVATE|MAP_ANONYMOUS,
                        -1, 0);
    int i=0;
    float myfloat = 3.1f;
    memory[i++] = 0x48; /* REX.W */
    memory[i++] = 0xf2; /*******************/
    memory[i++] = 0x0f; /* MOVSD xmm0, m64 */
    memory[i++] = 0x10; /*******************/

    memory[i++] = 0x47 | xmm(0) << 3; /* Not 100% sure this is correct */

    memory[i++] = 0; /* what goes here to load myfloat into xmm0? */

    memory[i++] = 0xc3; /* RET */
    return (fn) memory;
}

int main(){
    fn f = jit();
    printf("result: %f\n", (*f)());
    return 0;
}
Shootfast
  • 1,918
  • 2
  • 17
  • 27
  • There is no need for a `REX.W` as the operand size is fixed. After the opcode a `ModRM` is expected, its encoding and subsequent bytes (and even other `REX` prefixes) depends on the addressing mode you need. Which specific form of `movsd xmm0, m64` are you encoding? BTW `nasm` and `ndisasm` are very handy when crafting instructions, the first can be used to quickly create a template, the second to check the results. – Margaret Bloom Jun 26 '19 at 18:30
  • In GNU C, you need [`__builtin___clear_cache` after writing a buffer, before you can execute it as a function pointer](https://stackoverflow.com/questions/35741814/how-does-builtin-clear-cache-work#comment85964322_35741869). Otherwise the stores to `memory[]` can be optimized away as dead stores. (x86 doesn't actually need any cache flushing, it's just to tell the optimizer that the store results get used as code and thus aren't dead after all.) – Peter Cordes Jun 26 '19 at 21:03

1 Answers1

3

SSE instructions generally don't support immediates except for some rare instructions with a one-byte immediate to control their operation. Thus you need to:

  • store myfloat to some nearby memory area
  • generate a memory operand the references this area

Both steps are easy. For the first step, I'd simply use the beginning of memory and let the code start right afterwards. Note that in this case, you need to make sure to return a pointer to the beginning of the function, not the beginning of memory. Other solutions are possible. Just make sure that myfloat is stored within &pm;2 GiB from the code.

To generate the operand, revisit the Intel manuals. The addressing mode you want is a 32 bit RIP-relative operand. This is generated with mod = 0, r/m = 5. The displacement is a signed 32 bit number that is added to the value of RIP right at the end of the instruction (this is where the +4 comes from as have to factor in the lenth of the displacement).

Thus we have something like:

memory[i++] = 0xf2; /*******************/
memory[i++] = 0x0f; /* MOVSD xmm0, m64 */
memory[i++] = 0x10; /*******************/
memory[i++] = 0005 | xmm(0) << 3; /* mod = 0, r/m = 5: [rip + disp32] */
*(int *)(memory + i) = memory + i + 4 - addr_of_myfloat;
i += 4;
memory[i++] = 0xc3; /* RET */

Note that the REX prefix is not needed here.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • 1
    Also note that the function is declared to return a `double`, so you need an 8-byte constant. IDK why the OP is using a `3.1f` `float`; that just makes more work for themselves. – Peter Cordes Jun 26 '19 at 21:00
  • 1
    @PeterCordes Indeed! If desired, it is possible to load a 32 bit float from memory with `cvtss2sd`. – fuz Jun 28 '19 at 16:36