1

I'm doing some stuff with some crypto libraries and Zero Knowledge Proofs and the libraries I'm using only support Integers.

If I have two numbers stored as Doubles Precision Floats, and I copy the bits for each number into an Integer and I then compare the Integer will.

A_Double>B_Double==A_UINT>B_UINT

for all values.

My guess is for all positive values it might work, but not for negative values.

Although it might work if I flip the first bit.

  • Does this answer your question? [Comparing floats in their bit representations](https://stackoverflow.com/questions/2582417/comparing-floats-in-their-bit-representations) – Falk Hüffner Dec 02 '20 at 07:52
  • 1
    If you ignore NaN values, you need to flip all but the sign bit if the sign-bit is set. Then you can do signed integer comparison. – chtz Dec 02 '20 at 09:43
  • You have not stated what floating-point format is being used. – Eric Postpischil Dec 02 '20 at 09:59

2 Answers2

1

You have not stated the floating-point format used. IEEE 754, and most floating-point systems, use a format that is effectively sign-and-magnitude. The bit in the most significant position is a sign bit. Negating the value is done by flipping the sign bit. In contrast, negating the value in a two’s complement format is done by flipping all the bits and adding one.

Floating-point formats commonly follow the sign bit by the bits that are next most significant for the represent value, the exponent bits, and then by the significand bits. A consequence of this is that for two positive floating-point numbers x and y, x > y is equivalent to x > y, where x and y are the reinterpretations of the bits of x and y as either unsigned or two’s complement integers. However, for two negative floating-point numbers, x > y is equivalent to x < y, again whether unsigned or two’s complement.

Further, floating-point formats commonly have non-number values, called NaNs. With these values, integer comparisons fail because the floating-point comparisons are defined to be false for all comparisons—a NaN is not less than, equal to, or greater than a number or a NaN.

In summary, you generally do not want to compare floating-point values using integer interpretations of their bits.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

On a close track

With a lot of assumptions about the floating point encoding:

When the integer version of a < 0 flip the value, folding at 0footnote so +0.0 and -0.0 compare as equal. FP values are often encoded like sign-magnitude.

Also look for Nan encodings.


A C solution

#include <stdint.h>
#include <stdlib.h>

_Static_assert(sizeof(int64) == sizeof(double), "Unexpected sizes");

#define INT_NAN_TEST(a) (llabs(a) > 0x7FF0000000000000u)

/*
 * Compare 2 integers as is they overlay a same size, same endian 64-bit `double`.
 * Return -1,0,1 when they compare a > b, a===b, a< b
 * or `'?` for un-comparable as at least one of `a,b` is NaN
 *
 * This code assumes a lot about the representation of a `double`.
 * Same size
 * Same endian
 * FP with leading sign bit, then a biased exponent, then significand
 */
int cmp_double(int64_t a, int64_t b) {
  if (a < 0) {
    a = 0x8000000000000000 - a;
  }
  if (b < 0) {
    b = 0x8000000000000000 - b;
  }
  if (INT_NAN_TEST(a) || INT_NAN_TEST(b))
    return '?';
  return (a > b) - (a < b);
}

Test

#include <stdlib.h>
#include <stdio.h>
#include <float.h>
#include <limits.h>
#include <string.h>
#include <math.h>

int main() {
  const double d[] = {0.0, DBL_TRUE_MIN, DBL_MIN, 1, DBL_MAX, INFINITY, -0.0,
      -DBL_TRUE_MIN, -DBL_MIN, -1, -DBL_MAX, -INFINITY, NAN};
  size_t n = sizeof d / sizeof *d;
  for (size_t i = 0; i < n; i++) {
    double a = d[i];
    int64_t ai;
    memcpy(&ai, &a, sizeof ai);
    for (size_t bb = 0; bb < n; bb++) {
      double b = d[bb];
      int64_t bi;
      memcpy(&bi, &b, sizeof bi);
      int cmp_d = (isnan(a) || isnan(b)) ? '?' : (a > b) - (a < b);
      int cmp_i = cmp_double(ai, bi);
      if (cmp_d != cmp_i) {
        printf("% 20g %16llX % 20g %16llX: %2d %2d\n", a, 1llu * ai, b,
            1llu * bi, cmp_d, cmp_i);
      }
    }
  }
  puts("Done");
}

footnote By folding negative numbers so 0000_0000_0000_0000 and 8000_0000_0000_0000 are both "zero", the most negative "double as an integer" value will be 1 more than the most negative encodable integer, thus preventing UB in the llabs() calculation.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Thank this was the most useful, even though I was using a different language it was easy enough to follow and implement. Especially because I was able to implement a much easier implementation that only used positive numbers. – Some Guy Trying To Do Math Dec 05 '20 at 19:09