I'm using the Zarith library to do arbitrary-precision rational arithmetic. Suppose I have a rational number q
of type Q.t
that is the ratio of two big integers (Q
is Zarith's arbitrary-precision rational number module). Sometimes, I would like to print this number as a floating point number for readability, and sometimes I need to convert this number floating-point for later non-arbitrary-precision calculations. Is there a way to convert q
to a floating-point number up to a certain level of precision?
The way I am converting q
to floating-point now has no guarantees, and can create undefined floating-point numbers (Z
is the arbitrary-precision integer module):
let to_float q =
let n, d = num q, den q in
(* check if d is zero and raise an error if it is *)
let nf, df = Z.to_float n, Z.to_float d in
nf /. df
Is there a better way to handle this, where I can obtain a floating-point that most accurately approximates any q
?
Edit
I quickly wrote up Mark Dickinson's answer in OCaml, if anyone is interested. It can probably (definitely) be improved and cleaned up. I'll edit if I do so or if anyone has any advice for improvements. But for now this has solved my problem!
let to_float q =
let n, d = num q, den q in
let n_sign = Z.sign n in
let d_sign = Z.sign d in (* always >= 0 *)
if d_sign = 0 then raise Division_by_zero;
let n = Z.abs n in
if n_sign = 0 then 0. else
let shift = (Z.numbits n) - (Z.numbits d) - 55 in
let is_subnormal = shift < -1076 in
let shift = if is_subnormal then -1076 else shift in
let d = if shift >= 0 then Z.shift_left d shift else d in
let n = if shift < 0 then Z.shift_left n (-shift)
else n in
let quotient, remainder = Z.div_rem n d in
let quotient = if (Z.compare remainder (Z.zero)) = 0 && Z.is_even quotient then
Z.add Z.one quotient else quotient in
let quotient = if not is_subnormal then quotient else
let round_select = Z.to_int @@ Z.rem quotient @@ Z.of_int 8 in
Z.add quotient [|Z.zero;Z.minus_one;Z.of_int (-2);Z.one;Z.zero
;Z.minus_one;Z.of_int 2;Z.one|].(round_select)
in
let unsigned_res = ldexp (Z.to_float quotient) shift in
if n_sign = 1 then unsigned_res else -.unsigned_res
I'll look into writing an interface for GMP's mpq_get_d
function later, but I'm not entirely sure how to do that. The only way I see how to do that is to convert q : Q.t
to a string and pass that to:
int mpq_set_str (mpq_t rop, const char *str, int base)
Does anyone know how to then pass rop
to mpq_get_d
in OCaml or have a reference that describes how to do this? I looked through chapter 19 of RWO and didn't see a situation like this.