3

Possible Duplicate:
Which is the first integer that an IEEE 754 float is incapable of representing exactly?

This is basic question, my feeling is that the answer is yes(int = 32 bits, double = 53 bit mantisa + sign bit).

Basically can asserts fire?

int x = get_random_int();
double dx = x;
int x1 = (int) dx;
assert(x1 ==x);
if  (INT_MAX-10>x)
 {
       dx+=10;
       int x2=(int) dx;
       assert(x+10 == x2);
 }

Obviously stuff involving complicated expressions with divisions and similar stuff ( (int)(5.0/3*3) is not the same as 5/3*3)wont work, but I wonder do conversions and adition/substraction(if no overflow occurs) preserve equivalence.

Community
  • 1
  • 1
NoSenseEtAl
  • 28,205
  • 28
  • 128
  • 277
  • I wouldnt say duplicate, though idk what duplicate means... I mean I could get my A from some of A there but Q is not the same. :) – NoSenseEtAl Nov 07 '12 at 12:42
  • @NoSenseEtAl: it's essentially asking the same question. Any (good) answer to the other one would be a good answer to this one as well. – Joachim Sauer Nov 07 '12 at 12:54

2 Answers2

5

If the number of bits in the mantissa is >= the number of bits in the integer, then the answer is yes. In your question you give specific, known sizes for int and the mantissa of double, but it's useful to know that this is not guaranteed by the 2003 C++ standard, which says nothing about the relative sizes of int and double's mantissa.

Note that C and C++ are not required to use IEEE 754 floating-point arithmetic. According to 3.8.1/8 of the 2003 C++ standard,

The value representation of floating-point types is implementation-defined.

In fact C++ allows floating point representations that don't even use binary mantissas. For C, #including <limits.h> can be used to infer information about fundamental types. In particular, if FLT_RADIX raised to the power DBL_MANT_DIG is greater than or equal to INT_MAX, then all int values can be represented exactly. In C++, the relevant quantities are named numeric_limits<double>::radix, numeric_limits<double>::digits and numeric_limits<int>::max().

Given two integer operands and an operation that always produces an integer from integer operands (such as + or *, but not /), all IEEE 754 rounding modes will produce an integer exactly. If this integer is representable in an int (and therefore exactly representable in a double, given our assumption that its mantissa is at least as wide as an int), then it will be the same integer you would get by using the corresponding integer operation. Any sensible FP implementation will preserve the above guarantees, even if it is not IEEE 754 compliant.

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
3

Yes. All N bit ints can be represented in a floating point representation that has at least N-1 mantissa bits (because of the implicit leading 1 bit that doesn't need to be stored) and an exponent that can store at least N, i.e. has log(N)+1 bits.

So you can store an int32_t in a floating point value with 31 bits of mantissa, five bits of exponent, and one sign bit, which fits in a typical double but not a float. Conversely, a float with only 24 bits of mantissa can only accurately store ints with up to 25 bits, i.e. +/-33,554,431.

pndc
  • 3,710
  • 2
  • 23
  • 34
  • Single precision has 23 explicit bits (so can represent all integers with up to 24 bits, not 25). – Stephen Canon Nov 07 '12 at 12:42
  • Good point about the leading 1-bit and the need for a sufficiently large exponent. While it's hard to imagine an FP implementation with bits(exponent) < log(bits(mantissa)), it pays to be specific about these things! – j_random_hacker Nov 07 '12 at 12:52
  • Stephen is right about my off-by-one error on the range of a float. So the effective range is +/-16,777,215. – pndc Nov 07 '12 at 13:10