3

I'm trying to match the behavior exactly between an application running on both x86_64 and aarch64/arm64. However, they differ in how they cast a floating point number to an integer when it's outside of the possible range of integers.

Consider the following example:

#include <stdio.h>
#include <cstdint>

void cast(float value) {
  printf("uint32_t(%.2f) = %u\n", value, uint32_t(value));
}

int main() {
  cast(4294967808.);
}

# output on x86_64:  uint32_t(4294967808.00) = 512
# output on aarch64: uint32_t(4294967808.00) = 4294967295

The x86_64 version is using cvttss2si for the conversion, which wraps-around the answer, although the documentation is quite unclear on this. Aarch64 is using fcvtzu which is saturating.

Any solution to align the two would be interesting, but ideally I'd like to set a compiler flag on clang to have the aarch64 version behave like the x86_64 one (even though the aarch64 is "nicer")

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Johannes Hoff
  • 3,731
  • 5
  • 31
  • 37
  • 8
    The behavior is undefined. Also, ARM has a special instruction for this, because javascript has specified this behavior (which is undefined in C) to be the behavior of x86, because of course javascript would do such a thing. So to run javascript quickly, ARM had to invent an instruction doing something stupid that nobody could ever want because of a bad language that too many people are using. – EOF Feb 19 '21 at 14:46
  • 1
    @EOF Well, defining a behaviour isn't a bad design choice and returning a sentinel value isn't inherently bad either. – Bob__ Feb 19 '21 at 14:53
  • Thanks for your answers. Do you know what that instruction is, EOF? Because I sure couldn't find it! – Johannes Hoff Feb 19 '21 at 14:59
  • 3
    Actually, EOF, you set me on the right track! The instruction is `fjcvtzs` (intrinsic `__builtin_arm_jcvt`). Thanks! – Johannes Hoff Feb 19 '21 at 15:16
  • 3
    [Why do ARM chips have an instruction with Javascript in the name (FJCVTZS)?](https://stackoverflow.com/q/50966676/995714) – phuclv Feb 19 '21 at 15:55
  • `cvttss2si` doesn't wrap-around. To do float->uint32_t, the compiler will use it with 64-bit operand-size. i.e. it uses `(uint32_t)(int64_t)f` because that's how x86-64 can do it in one instruction. The high 32 bits of the 64-bit register aren't part of its value as a `uint32_t`. If the value was even larger, outside the range of `int64_t`, you'd get `0` (the low half of the integer-indefinite value the x86 manual documents, `0x8000000000000000`.) TL:DR: you need to look at what destination register width `cvttss2si` is used with; that determines what the the overflow cutoff is. – Peter Cordes Dec 26 '21 at 23:20

1 Answers1

6

Use the CPU instruction fjcvtzs (or the intrinsic __builtin_arm_jcvt) to get behavior of x86 on aarch64.

(Thanks to @EOF for providing enough information in a comment for me to find the answer)

Johannes Hoff
  • 3,731
  • 5
  • 31
  • 37
  • You could have asked him to reword his comment as an answer but that's just me –  Dec 26 '21 at 23:23