0

I'm trying to flip the sign bit of the least significant float inside of xmm0. I've tried to convert -0 into another xmm register and xor it with xmm0. Unfortunately, I've achieved to flip the sign though the value of my float is gone. Is there a way to use xorps in asm in order to flip the sign bit? I've also seen some posts on stackoverflow exactly doing that but in c.

# xmm0 contains 4 floats
# goal is to flip the sign of the least significant one
mov eax, -0
cvtsi2ss xmm1, eax
xorps    xmm0, xmm1
pedzer
  • 142
  • 11
  • 3
    `mov eax, 0x80000000; movd xmm1, eax; xorps xmm0, xmm1` – Jester Jul 11 '19 at 15:39
  • @Jester Many thanks! That was quick. I've tried something similar but that didn't work out as I had expect it. Unfortunately, I can't accept your solution. – pedzer Jul 11 '19 at 15:49
  • 3
    FYI: -0 and 0 are the same 2's complement integer number. -0.0 and 0.0 are different floating point numbers. So `mov eax, -0` is equivalent to `mov eax, 0` – pcarter Jul 11 '19 at 16:19
  • 2
    @Jester Or place `0x80000000` in memory and reference it with a memory operand. – fuz Jul 11 '19 at 16:22
  • Does anyone of you might answer my question really quick. Then I can mark it as done. The solution mentioned by @Jester, pcarter or fuz did fit to my problem, so just copy and paste it. – pedzer Jul 11 '19 at 17:04
  • 1
    @Jester: If you're going to generate it on the fly, I'd tend to go for `pcmpeqd xmm1,xmm1` / `pslld xmm1, 31` unless you specifically want to leave the high elements unmodified. – Peter Cordes Jul 11 '19 at 20:55

1 Answers1

3

To flip the sign bit of the least significant float inside of xmm0 the solution looks like what Jester posted in the comment section of my question:

mov eax, 0x80000000
movd xmm1, eax
xorps xmm0, xmm1

(credits to Jester and everyone else who helped me. I just wanted to mark this topic as done)

pedzer
  • 142
  • 11
  • 2
    If you're going to avoid loading a constant from memory, `pcmpeqd xmm1,xmm1` / `pslld xmm1, 31` about as efficient to materialize a vector of `set1(-0.0)`. ([What are the best instruction sequences to generate vector constants on the fly?](https://stackoverflow.com/q/35085059)). Both ways are 9 bytes of code, though, before the `xorps`. Only difference is which back-end ports the uops can run on. `movd` is limited to one port on Intel (port 5), `pcmpeqd` and `pslld` can each run on at least 2 ports on most CPUs. So the best choice depends on surrounding code. movd latency prob. irrelevant. – Peter Cordes May 29 '20 at 19:22