2

I noticed when I try to cast an int to float then GCC (with -O3) inserts a PXOR instruction.

float
f(int n) {
    return n;
}

this would generate:

f:
        pxor    xmm0, xmm0             ; this line. 
        cvtsi2ss        xmm0, edi
        ret

Question

  • I can't understand why is it needed here. This link says that CVTSI2SS is used for:

Converting Doubleword Integer to Scalar Single-Precision Floating-Point Value.

Based on this document I don't understand why we need to insert a PXOR there.

Notes

  • I couldn't find anything while searching on internet (I'm really having hard time for searching answers in ASM topic c: ).

  • Previously I asked why GCC inserts an "empty" XOR. And the explanation there is that I had a UB there and GCC saved me from me. But here casting an int to float isn't UB (If I'm not wrong).

  • I thought that maybe explicitly saying that I want a float cast would help but no.

float
g(int n) {
    return (float)n;
}

returns same asm:

g:
        pxor    xmm0, xmm0
        cvtsi2ss        xmm0, edi
        ret
  • If I'm not wrong int is not defined to be exactly 32bits unlike float. So I thought maybe that's the reason. But following code didn't gave me the output I was looking for:
float
h(int32_t n) {
    return (float)n;
}

this outputs:

h:
        pxor    xmm0, xmm0
        cvtsi2ss        xmm0, edi
        ret
  • Also interesting that CLANG doesn't have PXOR there.
  • Finally the Godbolt link.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    `cvtsi2ss` and similar were badly / shortsightedly designed by Intel to merge into the destination XMM register instead of zero-extending. GCC is cautious and spends extra instructions to break dependencies just in case XMM0 was at the end of a long dep chain independent from EDI. As usual, clang is reckless (although I think it might pxor inside a loop within a single function.) – Peter Cordes Apr 30 '21 at 11:36
  • Seems relevant: [**Usage of instruction pxor before SSE instruction cvtsi2ss**](https://stackoverflow.com/questions/42286779/usage-of-instruction-pxor-before-sse-instruction-cvtsi2ss) – Andrew Henle Apr 30 '21 at 11:36
  • @AndrewHenle what keywords did you use to find this question? c: –  Apr 30 '21 at 11:38
  • " I had a UB there and GCC saved me from me" is nonsense as if the behaviour is undefined then the compiler does NOT have to save you , that's the whole reason UB exists – M.M Apr 30 '21 at 11:39
  • @PeterCordes can I ask why `CLANG` doesn't do it? if there's a bug there. Can I somehow visually see that? What should I write instead of this (just trying to understand the concept here c: )? –  Apr 30 '21 at 11:40
  • 1
    See my final edit to my previous comment: as usual, clang is just being reckless and hoping XMM0 wasn't last written by a cache-miss load or a dep chain involving one. – Peter Cordes Apr 30 '21 at 11:42
  • It seems the correctness of this would depend on the ABI, whether the caller is only supposed to read half of the register, or whether it may use all 64 bits – M.M Apr 30 '21 at 11:47
  • Possible duplicate [x86 - instruction interleaving to avoid cpu stall](https://stackoverflow.com/q/37906394) - I went into detail in multiple paragraphs about the bad design of `cvtsi2sd` that GCC is working around with PXOR, but it's not the main focus of the question. – Peter Cordes Apr 30 '21 at 11:47
  • @M.M: high garbage is allowed in registers outside the part that holds the real return value, in all mainstream x86 calling conventions. (I'm 100% sure about x86-64 System V; I even asked one of the ABI maintainers for clarification on that point once.) That goes for XMM regs and also for integer regs. e.g. `char foo(int x){return x;}` can do `mov eax, edi` instead of `movzx` if it wants. Or tailcall a function returning `int`. Also for xmm function args: apparently some FP libraries have had to be bugfixed when they caused spurious FP exceptions in high elements from scalar args. – Peter Cordes Apr 30 '21 at 11:49
  • @PeterCordes OK; neither of the two suggested duplicates so far mention that (I guess they seem to assume the asker already is aware, although in this question OP wasn't) – M.M Apr 30 '21 at 11:51
  • 1
    @M.M: Yeah, fair point, the fact that clang omits `pxor` in this case could have been (but isn't) due to undocumented calling-convention extensions / modifications, unlike for incoming narrow-integer args where clang depends on an undocumented extension that it and GCC both follow on the calling side ([Is garbage allowed in high bits of parameter and return value registers in x86-64 SysV ABI?](https://stackoverflow.com/q/40475902)) – Peter Cordes Apr 30 '21 at 11:54
  • 2
    Ah, there we go, I knew I'd written about this before, and fortunately not just as a pet-peeve footnote in other answers. [Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?](https://stackoverflow.com/q/60688348) is 100% about exactly the reason that GCC is using PXOR and clang isn't, in a case where it matters. That answer collects up other related links, including the GCC missed-optimization bugs I've filed about better ways to work around it (with AVX). Also fun fact: some compilers (MSVC IIRC) use `movd xmm0, edi` / `cvtdq2ps` to save uops. – Peter Cordes Apr 30 '21 at 12:04
  • @Hrant I used https://www.google.com/search?q=pxor++++xmm0%2C+xmm0 – Andrew Henle Apr 30 '21 at 12:36

0 Answers0