2

I'm trying to make a software that users can move in a wide range(at least 1Mly diameter range and at least 0.1mm position representation precision). I think of 128bit fixed point number to represent position. However, mathematical calculation(e.g. distance, sqrt, divide, integration) is not suitable for fixed(or integer), so I use double or single floating point for math. (Usually on the result of subtracting two int128 coordinates to get a relative distance, so usually the value is small enough to not lose too much precision, or the big diff values needn't so many precision.)

So I encountered a problem when implementing fixed128: how to do fast int128-double conversion with AVX2 SIMD? (AVX512 is not popular so I can't use it in this software)


What I've tried(A bit long, maybe it can be ignored):

I've referred to this answer:How to efficiently perform double/int64 conversions with SSE/AVX?

Wim's answer showed that when we need convert int64 to double, splitting multiple integer to less than 52bits long as significand and concating exponent bits in the left, the do fp math to reduce the extra exponents is efficient.
So I tried to split uint128 (consisting of two uint64s: ilow and ihigh) into three parts:

  • part1 v_lo: ilow's low 48 bits;

  • part2 v_mi: ilow's high 16 bits and ihigh's low 16bits;

  • part3 v_hi: lhigh's high 48 bits;

We can get the v_lo and v_hi with the method almost same as wim's "uint64_to_double_fast_precise", but part2 "v_mi" become a problem. it increased 4 instructions which is more than low+high(1+2).(my code following)
Maybe there's faster way by some magical swizzle with permute/shuffle/unpackhi/unpacklo/broadcast/blend or their combination? These swizzle intrinsic really swizzled me.

my code for ufixed128-double conversion:

constexpr auto fix128_frac_bits = 32;

__m256d ufixed128_to_double_fast(const __m256i& ihigh, const __m256i& ilow)
{
//constants
__m256d magic_d_hm = _mm256_set1_pd(pow(2.0, 52 + 48 - fix128_frac_bits) + pow(2.0, 52 + 80 - fix128_frac_bits));
__m256d magic_d_lo = _mm256_set1_pd(pow(2.0, 52 - fix128_frac_bits));
__m256i magic_i_lo = _mm256_castpd_si256(magic_d_lo);
__m256i magic_i_mi = _mm256_castpd_si256(_mm256_set1_pd(pow(2.0, 52 + 48 - fix128_frac_bits)));
__m256i magic_i_hi = _mm256_castpd_si256(_mm256_set1_pd(pow(2.0, 52 + 80 - fix128_frac_bits)));
//majik operations
__m256i v_lo = _mm256_blend_epi16(ilow, magic_i_lo, 0b10001000);
__m256i v_mi = _mm256_slli_epi64(ihigh, 16);
__m256i losr48 = _mm256_srli_epi64(ilow, 48);
v_mi = _mm256_xor_si256(v_mi, losr48);
v_mi = _mm256_blend_epi32(magic_i_mi, v_mi, 0b01010101);
__m256i v_hi = _mm256_srli_epi64(ihigh, 16);
v_hi = _mm256_xor_si256(v_hi, magic_i_hi);
//final fp 
__m256d loresult = _mm256_sub_pd(_mm256_castsi256_pd(v_lo), magic_d_lo);
__m256d result = _mm256_sub_pd(_mm256_castsi256_pd(v_hi), magic_d_hm);
result = _mm256_add_pd(result, _mm256_castsi256_pd(v_mi));
result = _mm256_add_pd(result, loresult);
return result;
}

Edit: I've successfully made signed fixed128_to_double, just fp64 add '2.0^(127 - fix128_frac_bits)' into constant 'magic_d_hm' and 'magic_i_hi'. But there's no fast 'double_to_int128' and 'double_to_uint128' which I have no idea. I can do it faster than C++ 'static_cast' scalar convert with do bit operstions(mask out exponent and sign, and concat hidden 1,and do left/right shift), but it's much slower than thouse magical ops and use a lot of registers for constants.

Can anyone help me?

If I'm in a blind alley, and there's a better method than fixed128/double-double to represent the wide range position, please tell me. (Except floating-origin or floating-grid(int64-double):they are unstable for physics, or exposes a lot of complexity to the upper construction, or hard to do AVX acceleration.)

About double-double: I planned to compare performance between fixed128 and double-double after highly optimized them, and decide which to use after that. That's another work I'm doing.

my current codes: https://github.com/Veloctor/Int128

Velctor
  • 21
  • 4
  • If you're going to convert your numbers to double-double some of the time, perhaps just always keep them in that form. That's about 107 mantissa bits of precision at any magnitude, with 11 exponent bits of range (https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic). – Peter Cordes Aug 10 '22 at 21:17
  • Your question *title* seems to be asking about converting to `double`, which has much less precision than a large `int128` (but *more* precision for small numbers than fixed-point `int128`); `double` would be a really weird choice. GCC has a helper function for `__int128` to `double`, but neither it nor clang know how to vectorize it: https://godbolt.org/z/ajs548c4x . But you can look at the libgcc code for `__floattidf`. (https://github.com/gcc-mirror/gcc/blob/master/libgcc/soft-fp/floattidf.c and related headers, although that seems to be pure software FP.) – Peter Cordes Aug 10 '22 at 21:17
  • @PeterCordes thanks for comment. Imagine this scene: two mass point at a very far distance to coordinate origin(e.g. 1kly), but their distance is close(e.g. 1km), their coordinate position number need huge amount of precision, but relative position don't. Converting the relative pos to fp then do other things(e.g. interaction, rendering) will be meaningful. – Velctor Aug 10 '22 at 22:52
  • Ah, that makes sense. If the relative position fits in an `int64_t`, then you can use hardware support. (SIMD with AVX-512, or with the tricks you found for vector conversion to double. Or scalar conversion directly in hardware without AVX-512.) – Peter Cordes Aug 10 '22 at 23:07
  • @PeterCordes This is the pity: on Steam hardware statistics, The penetration rate of avx512 is only 7.13%, and it is still decreasing, so I can't place my hopes on it.(If mm*_cvtpd_epi64 and another three instruction in AVX512 is popular, there will have no this post.) But force interaction(sqrt and dot) and rendering(GPU can only hold fp32 efficiently) can't leave this. And if we need some far relative pos like render planets at AUs distance, we can't assume it's in a int64's range. – Velctor Aug 10 '22 at 23:36
  • 1
    Your `magic_d_all` constant cannot be represented exactly as a `double`, you need to split that into at least two constants. – chtz Aug 10 '22 at 23:51
  • BTW: I guess it will not be possible to get a correctly rounding conversion based on adding three doubles, e.g., if your int128 value is `2^126+2^(126-53)+1`, this should round up to `2^126+2^(126-52)`, but adding either pair of terms will always round down. If you are fine with 1ULP error, I assume your implementation should work (after splitting up the `magic_d_all` constant) – chtz Aug 11 '22 at 10:04
  • 1
    Are you sure you need these complications? Real-world space vessels are using FP64 numbers just fine. – Soonts Aug 11 '22 at 10:17
  • @Soonts enough for engineering(which only care about relative precision), but not enough for visual(care about absolute precision). e.g. free space, 1kly from origin, error of fp64 is 2048m, in real world we don't care about this or we can reference nearyby star for coord origin, but for virtual visual it's unacceptable and too expensive and complex to change origins(it's floating origin method) – Velctor Aug 11 '22 at 15:27
  • 1
    @Velctor Changing origins is easy and shouldn’t cause any issues. Far away from a stellar system, you only see a single star and 2km error is fine. Once approached a stellar system switch to the origin of that system, all your coordinates become well under 1E+16 meters and FP64 error diminishes to under 1 meter. Similarly, once approached a planetary system, switch there and you’ll have your sub-mm precision very suitable for accurately rendering your stuff. – Soonts Aug 11 '22 at 16:26
  • @Soonts that's already a problem: two vessel at far away from stellar system and they are close(100m?), In one of them's view, what will another vessel look like? maybe look like a electron in proton's view "randomly teleport anywhere" aroud it, I don't know, but it can't be continuous motion. If the origin is based on "grids" or "chunks" in the space, just like Minecraft, the position representation becames int64+double, which is more complex and performance-heavy than two integers. – Velctor Aug 12 '22 at 13:00
  • 1
    @chtz Thanks, I'v fixed this. Now the convert is correct – Velctor Aug 12 '22 at 14:03

0 Answers0