Converting a long double to double with upward (or downward) rounding

Question

Assume that we are working on a platform where the type long double has a strictly greater precision than 64 bits. What is the fastest way to convert a given long double to an ordinary double-precision number with some prescribed rounding (upward, downward, round-to-nearest, etc.)?

To make the question more precise, let me give a concrete example: Let N be a given long double, and M be the double whose value minimizes the (real) value M - N such that M > N. This M would be the upward rounded conversion I want to find.

Can I get away with setting the rounding mode of the FP environment appropriately and performing a simple cast (e.g. (double) N)?

Clarification: You can assume that the platform supports the IEEE Standard for Floating-Point Arithmetic (IEEE 754).

Certainly FP hardware would be faster than any bit manipulation. Are you looking for a software solution? — chux - Reinstate Monica, Nov 25 '14 at 02:57
Your question is platform-specific, but your proposed solution is exactly how I'd go about this task. — tmyklebu, Nov 25 '14 at 03:31
@chux : That is exactly what I meant, thanks for the correction. — iheap, Nov 25 '14 at 07:20
@tmyklebu : You are right, you can assume that the underlying platform supports the IEEE 754 standard. I added this clarification to the question body as well. — iheap, Nov 25 '14 at 07:23
@chux : First of all, I would like to know whether the solution I proposed always works under the stated assumptions (i.e. IEEE 754). I am also interested in a more general software solution. For example, a binary-type search should work, but is probably not optimal. — iheap, Nov 25 '14 at 07:38
If you are working with `gcc` it will, see for example [**Rounding Modes**](http://www.gnu.org/software/libc/manual/html_node/Rounding.html). It may vary by compiler. — David C. Rankin, Nov 25 '14 at 07:53
Apart from the answer I have written, I was about to ask a question about “converting from `double` to `float` rounding down in Java”, bearing in mind that Java does not allow to change the rounding mode from the round-to-nearest default. Multiplying by `0x1.ffffff0001p-1` before rounding means that most positive `double`s are converted correctly (to the `float` immediately below them), but finding a sequence of operations that works for all cases may be tricky. In order to convert rounding down from `long double` to `double`, the constant would be `0x1.fffffffffffff8Lp-1`. — Pascal Cuoq, Nov 25 '14 at 15:18
@PascalCuoq: In Java, it's better to do the bit-fiddling yourself. This holds for just about everything the Java designers didn't think about in 1994 or whatever. — tmyklebu, Nov 25 '14 at 15:45
@tmyklebu I have gotten used to bit-fiddle with sequences like `frexp` -> access significand as integer -> `ldexp`, which save me from having to remember exponent biases and usually simplify the case of denormals, but Java does not even appear to have `ldexp` and `frexp`. — Pascal Cuoq, Nov 25 '14 at 16:35
@PascalCuoq: Correct. You have to do the bit-fiddling yourself. I don't even try to write finicky FP code in Java for these sorts of reasons. — tmyklebu, Nov 25 '14 at 16:57

score 3 · Accepted Answer · edited May 23 '17 at 10:33

Can I get away with setting the rounding mode of the FP environment appropriately and performing a simple cast (e.g. (double) N)?

Yes, as long as the compiler implements the IEEE 754 (most of them do at least roughly). Conversion from one floating-point format to the other is one of the operations to which, according to IEEE 754, the rounding mode should apply. In order to convert from long double to double up, set the rounding mode to upwards and do the conversion.

In C99, which should be accepted by C++ compilers (I'm not sure a syntax is specified for C++):

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
…
fesetround(FE_UPWARD);
double d = (double) ld;

PS: you may discover that your compiler does not implement #pragma STDC FENV_ACCESS ON properly. Welcome to the club.

Converting a long double to double with upward (or downward) rounding

1 Answers1