4

Possible Duplicate:
Real numbers - how to determine whether float or double is required?

I'm trying to check if a conversion from double to float will result in loss of precision. Obviously, I can do the conversion and convert the float back into double and compare it to the original value. I'm curious as to whether there's a more direct way.

Community
  • 1
  • 1
cleong
  • 7,242
  • 4
  • 31
  • 40
  • Related: http://stackoverflow.com/questions/11772776/how-are-double-precision-floating-point-numbers-converted-to-single-precision-fl – NominSim Jan 08 '13 at 14:33
  • The answer is in here, if you do a bit of math: http://stackoverflow.com/questions/5098558/float-vs-double-precision – Richard A. Jan 08 '13 at 14:34
  • Please clarify. What is safe? You always lose bits; do you consider it safe to lose bits if and only if they were all set to 0? – MSalters Jan 08 '13 at 15:03
  • The question you should be asking is not whether you will lose precision, but whether you will lose enough precision to *matter*. Most floating-point numers start off representing imprecise quantities, and the results of most mathematical computations end up being rounded at some point, whether they're rounded to a pixel coordinate or RGB value for graphical-display purposes, rounded to some number of digits for numerical-display purpose, etc. The fundamental question is whether early rounding will add an unacceptable amount of uncertainty to the result beyond what's already there. – supercat Feb 04 '13 at 17:48

2 Answers2

9

Converting to float and back is generally the most efficient solution; on most common architectures it will require only a couple instructions, with a latency of a couple cycles each. This also has the virtue of being both simple and correct.

On platforms that do not have hardware support for floating-point, you can do the check more efficiently by taking apart the number, and checking whether the exponent and significand fit into single-precision, but that is a relatively uncommon corner-case, and this is much more error-prone and not portable to platforms that use different FP formats.

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
0

A floating point number has two parts, the mantissa and the exponent. A double has more bits assigned to both parts. Assigning a double to float will drop mantissa bits which gives you less digits of precision, which is to be expected. However if the double exponent doesn't fit in the float exponent, then the float will be a garbage value.

brian beuning
  • 2,836
  • 18
  • 22
  • 3
    “if the double exponent doesn't fit in the float exponent, then the float will be a garbage value” Converting to `float` a `double` with too low an exponent to be represented should give 0. (or a subnormal if some bits of the significand can still be represented in a `float` subnormal). Converting a `double` with too high an exponent should give +inf or -inf. – Pascal Cuoq Jan 08 '13 at 15:18
  • Never knew that. Thanks! – brian beuning Jan 08 '13 at 17:01