9

I'm using the Zarith library to do arbitrary-precision rational arithmetic. Suppose I have a rational number q of type Q.t that is the ratio of two big integers (Q is Zarith's arbitrary-precision rational number module). Sometimes, I would like to print this number as a floating point number for readability, and sometimes I need to convert this number floating-point for later non-arbitrary-precision calculations. Is there a way to convert q to a floating-point number up to a certain level of precision?

The way I am converting q to floating-point now has no guarantees, and can create undefined floating-point numbers (Z is the arbitrary-precision integer module):

let to_float q =
  let n, d = num q, den q in
  (* check if d is zero and raise an error if it is *)
  let nf, df = Z.to_float n, Z.to_float d in
  nf /. df

Is there a better way to handle this, where I can obtain a floating-point that most accurately approximates any q?

Edit

I quickly wrote up Mark Dickinson's answer in OCaml, if anyone is interested. It can probably (definitely) be improved and cleaned up. I'll edit if I do so or if anyone has any advice for improvements. But for now this has solved my problem!

let to_float q = 
  let n, d = num q, den q in
  let n_sign = Z.sign n in
  let d_sign = Z.sign d in (* always >= 0 *)
  if d_sign = 0 then raise Division_by_zero;
  let n = Z.abs n in
  if n_sign = 0 then 0. else
    let shift = (Z.numbits n) - (Z.numbits d) - 55 in
    let is_subnormal = shift < -1076 in
    let shift = if is_subnormal then -1076 else shift in
    let d = if shift >= 0 then Z.shift_left d shift else d in
    let n = if shift < 0 then Z.shift_left n (-shift)
      else n in
    let quotient, remainder = Z.div_rem n d in
    let quotient = if (Z.compare remainder (Z.zero)) = 0 && Z.is_even quotient then
        Z.add Z.one quotient else quotient in
    let quotient = if not is_subnormal then quotient else
        let round_select = Z.to_int @@ Z.rem quotient @@ Z.of_int 8 in
        Z.add quotient [|Z.zero;Z.minus_one;Z.of_int (-2);Z.one;Z.zero
                        ;Z.minus_one;Z.of_int 2;Z.one|].(round_select)
    in
    let unsigned_res = ldexp (Z.to_float quotient) shift in                                                                                                             
    if n_sign = 1 then unsigned_res else -.unsigned_res

I'll look into writing an interface for GMP's mpq_get_d function later, but I'm not entirely sure how to do that. The only way I see how to do that is to convert q : Q.t to a string and pass that to:

int mpq_set_str (mpq_t rop, const char *str, int base)

Does anyone know how to then pass rop to mpq_get_d in OCaml or have a reference that describes how to do this? I looked through chapter 19 of RWO and didn't see a situation like this.

Dimitrios
  • 301
  • 3
  • 11
  • Please define "better": what do you care about? Speed? Accuracy? Something else? If you're interested in getting accurate results at the expense of speed, then you can use integer arithmetic operations to find the *closest* floating-point number to the actual value of `q`. This would also have the advantage of still giving good results when the numerator and denominator are outside the usual range of float, but the value of `q` is within range. Your current approach could have an error of up to 2 ulps, which may not matter for your applications. – Mark Dickinson Nov 10 '15 at 06:41
  • @MarkDickinson: Accuracy is most important for me. What I am currently doing cannot handle big integers in the numerator and denominator, but where `q` is within the usual floating-point range. It will just be `undef` in this case. – Dimitrios Nov 10 '15 at 06:48
  • @Dimitrios there is a wrapper around Zarith, and it seems they use the same algo. See https://github.com/janestreet/bignum/blob/master/src/bignum0.ml#L584 – Stas Nov 10 '15 at 08:46
  • this has been implemented in the mean time: https://github.com/ocaml/Zarith/issues/7 – Michael Nov 30 '18 at 14:12

2 Answers2

5

If you have access to

  • an integer log2 operation, and
  • the ability to left shift an integer by a given number of bits

then it's relatively easy to roll your own correctly rounded conversion. In a nutshell, the method looks like this:

  1. Reduce to the case n > 0, d > 0; filter out obvious underflow/overflow
  2. Choose an integer shift so that 2^-shift*n/d lies between 2^54 and 2^56.
  3. Use integer arithmetic to compute x = 2^-shift*n/d, rounded to the nearest integer using the round-to-odd rounding method.
  4. Convert x to the nearest IEEE 754 double-precision value dx, with the usual round-ties-to-even rounding mode.
  5. Return ldexp(dx, shift).

I'm afraid I'm not fluent in OCaml, but the following Python code illustrates the idea for positive inputs. I leave it to you to make the obvious modifications for negative inputs and for division by zero. You might also want to make an early return for cases of extreme overflow and underflow: those are easy to detect by looking for extra large or small values of shift below.

from math import ldexp

def to_float(numerator, denominator):
    """
    Convert numerator / denominator to float, correctly rounded.

    For simplicity, assume both inputs are positive.
    """
    # Shift satisfies 2**54 < (numerator / denominator) / 2**shift < 2**56
    shift = numerator.bit_length() - denominator.bit_length() - 55

    # Divide the fraction by 2**shift.
    if shift >= 0:
        denominator <<= shift
    else:
        numerator <<= -shift

    # Convert to the nearest integer, using round-to-odd.
    q, r = divmod(numerator, denominator)
    if r != 0 and q % 2 == 0:
        q += 1

    # Now convert to the nearest float and shift back.
    return ldexp(float(q), shift)

Some notes:

  • the bit_length method on a positive integer n gives the number of bits required to represent n, or in other words 1 + floor(log2(n)).
  • divmod is a Python function that computes the quotient and remainder of an integer division at the same time.
  • the quantity q fits (easily) in a 64-bit integer
  • we're rounding twice: once when converting the shifted numerator / denominator to the nearest integer, and again when rounding that integer to a float. The first round uses the round-to-odd method; this ensures that the second round (implicit in the conversion from int to float) gives the same result as if we'd rounded the fraction directly to a float.
  • the above algorithm doesn't correctly handle fractions whose converted float value is subnormal: in that case, the ldexp operation introduces potentially a third rounding. It's possible to deal with this, with some care. See below for some code.

The above is in fact a simplified version of the algorithm that Python uses when dividing one (big) integer by another to get a floating-point result. You can see the source here. The comment at the beginning of the long_true_divide function gives an overview of the method.

For completeness, here's a variant that also deals correctly with subnormal results.

def to_float(numerator, denominator):
    """
    Convert numerator / denominator to float, correctly rounded.

    For simplicity, assume both inputs are positive.
    """
    # Choose shift so that 2**54 < numerator / denominator / 2**shift < 2**56
    shift = numerator.bit_length() - denominator.bit_length() - 55

    # The 'treat_as_subnormal' flag catches all cases of subnormal results,
    # along with some cases where the result is not subnormal but *is* still
    # smaller than 2**-1021. In all these cases, it's sufficient to find the
    # closest integer multiple of 2**-1074. We first round to the nearest
    # multiple of 2**-1076 using round-to-odd.
    treat_as_subnormal = shift < -1076
    if treat_as_subnormal:
        shift = -1076

    # Divide the fraction by 2**shift.
    if shift >= 0:
        denominator <<= shift
    else:
        numerator <<= -shift

    # Convert to the nearest integer, using round-to-odd.
    q, r = divmod(numerator, denominator)
    if r != 0 and q % 2 == 0:
        q += 1

    # Now convert to the nearest float and shift back.
    if treat_as_subnormal:
        # Round to the nearest multiple of 4, rounding ties to
        # the nearest multiple of 8. This avoids double rounding
        # from the ldexp call below.
        q += [0, -1, -2, 1, 0, -1, 2, 1][q%8]

    return ldexp(float(q), shift)
Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
  • I like the round-to-odd trick. But you are aware that ldexp might itself be inexact in case of underflow. So how do you avoid double rounding for fractions that would underflow? – aka.nice Nov 10 '15 at 21:22
  • 1
    @aka.nice: For results that are going to underflow, you simply want to round to the nearest integer multiple of `2**-1074`, and a variant of the code above will do that. The notes in the Python code go into this in some detail. – Mark Dickinson Nov 10 '15 at 21:44
  • Nice! That sounds a little bit simpler than Squeak/Pharo Smalltalk version then, but also because divmod rounds quotient to nearest integer, in Smalltalk I only had the truncated quotient. I prefer to read the shorter version though http://smallissimo.blogspot.fr/2011/09/reviewing-fraction-asfloat.html – aka.nice Nov 10 '15 at 21:55
  • Nice article! Thanks for that link. For completeness, I've added a variant that deals correctly with the subnormal case. – Mark Dickinson Nov 10 '15 at 22:17
  • 1
    And as a result of this question and answer, I realised that there's a bug in Python's implementation: http://bugs.python.org/issue25604 – Mark Dickinson Nov 12 '15 at 08:53
  • This algorithm makes sense, but I feel it would be less tricky to just do the rounding with integer arithmetic, making sure the integer results ends up with exactly 53 bits, then converting to float. This way we wouldn't have to deal with double rounding at all (except for denormals?). Is there a reason why this doesn't work? – zale Jun 20 '19 at 23:55
  • 1
    @zale: Yes, that also works, and in fact that's what I did in the [CPython source](https://github.com/python/cpython/blob/e30464040a38c7b038621768b7e9ba8a083ee702/Objects/longobject.c#L3805-L3810); it has the additional advantage that you don't have to worry about someone changing the FPU rounding mode. The code ends up a touch more complicated, though: you need to first figure out how many bits there are in `q` so you know how many to round away (should be either 2 or 3, but you have to figure out which), then do the rounding. I wanted to keep the answer in this question simple. – Mark Dickinson Jun 21 '19 at 06:51
2

This isn't a complete answer, but in looking around I found that Zarith uses GMP internally. There is a GMP function named mpq_get_d that converts a rational to a double. If it's not available directly in Zarith, it should be possible (given some time) to add an interface for it.

Jeffrey Scofield
  • 65,646
  • 2
  • 72
  • 108
  • It isn't yet available in Zarith, so it seems I will need to wait or try adding an interface to it myself. Is there no other alternative? – Dimitrios Nov 10 '15 at 07:08
  • I read the source of `mpq_get_d` and it looks pretty hairy. In particular, it uses other parts of GMP (not just simple operations). If I wanted to do conversions to double (which it seems like *everybody* would at some point :-), I'd probably (try to) hack a call to `mpq_get_d` into Zarith. I have no idea if somebody has already done this, or reimplemented it in OCaml (say). – Jeffrey Scofield Nov 10 '15 at 07:14
  • 1
    Speaking as a Zarith contributor, adding bindings for `mpq_get_d` is clearly the right way to do this. Each `Z.t` in the definition of `Q.t` is either an unboxed integer or an allocated representation. The stub for `mpq_get_d` should create the arguments for `mpq_get_d` from these two `Z.t` and call it. – Pascal Cuoq Nov 10 '15 at 08:31
  • Thanks for pointing this out. @PascalCuoq: This does seem ideal, but it would take me some time to figure it out. – Dimitrios Nov 11 '15 at 05:13