0

I am learning SSE in x64 assembly and was trying to change the sign of a float number stored in xmm0. To do that, I used xorps with a mask that I stored in the rodata section of my executable.

mov     eax, ds:dword_403000
mov     [rbp-4], eax
movss   xmm0, [rbp-4]
xorps   xmm0, ds:xmmword_403004
movss   [rbp-4], xmm0

However, my program crashes at the xorps instruction. On IDA I get the following error:

404016: The instruction at 0x404016 referenced memory at 0xFFFFFFFFFFFFFFFF. The memory could not be read (exc.code c0000005, tid 10220)

But the mask seems to be there and have the correct size. I do not understand why the memory is 0xFFFFFFFFFFFFFFFF. Someone has an idea about what I did wrong?

My program seen my IDA
The float values in the rodata section

The program is generated by a compiler I am working on who directly outputs an executable without using an assembler and linker so I can't post the assembly file but I uploaded the executable for Windows here.

  • 1
    Hello and welcome to Stack Overflow. Please do not post pictures of text please. Instead, post text as text. Also provide a [mcve] if possible. The code you have shown is incomplete (it is missing the definition of the rodata entries in question) and thus cannot be assembled as is. If the code was produced e.g. by compiling some C code, you might also want to show the corresponding high level code. – fuz Mar 04 '23 at 21:43
  • 3
    That said, the problem is most likely that `xmmword_403004` is not aligned to a multiple of 16 bytes. I can't say for sure as the definition is not included. I have not looked at your pictures as I do not look at pictures of text. – fuz Mar 04 '23 at 21:43
  • Is this unoptimized compiler output? `movss xmm0, ds:dword_403000` directly. Or better, `pcmpeqd xmm0,xmm0` / `psrlld xmm0, 31` to create the `0x80000000` mask. Don't use memory-source 16-byte operands with SSE unless you know they're aligned. For scalar, probably `movss` load into a different register. – Peter Cordes Mar 04 '23 at 21:59
  • @Peter Cordes, Thank you for your answer. It is the output of a compiler i wrote for my own programming language, it is very much not optimized right now. I tried adding some padding to start the mask at the address `403008` so the address is divisible by 16 but the crash is still there. I will try the `psrlld` method because it looks cleaner but i am still confused about why this snippet doesn't work. I feel there is something i didn't understand about addressing. – confused_opossum Mar 04 '23 at 22:27
  • 3
    Is 403008 in hexadecimal? Then it's not divisible by 16 (it doesn't have a 0 as the least significant nibble) (showing an address in decimal would be odd) – harold Mar 04 '23 at 22:29
  • 1
    @harold: good catch, i am dumb. It works now, thanks a lot! – confused_opossum Mar 04 '23 at 22:35
  • 1
    Oops, `psrlld` was a typo, I meant `pslld`. https://www.felixcloutier.com/x86/psllw:pslld:psllq . As in [Fastest way to compute absolute value using SSE](https://stackoverflow.com/q/32408665) – Peter Cordes Mar 05 '23 at 00:08

0 Answers0