0

If I have two random floats or doubles that represent exact integers (within the range of a 32-bit integer), can I expect any and all addition, subtraction, and multiplication between them yield an integer float/double with no fractional part?

float x = randInt();
float y = randInt();
float resultAdd = x + y;
float resultSub = x - y;
float resultMul = x * y;
if(fract(resultAdd) == 0.f && fract(resultSub) == 0.f && fract(resultMul) == 0.f){
    // will this section always execute, assuming no overflow occurred?
}

Everyone understands to never trust floating-point precision, but I would like to rebuild trust where appropriate. Given that some interpreted languages (unwisely) use floats/doubles as the basis of a generic "number" type, it's important to know what operations can preserve a float's status as an integer.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Anne Quinn
  • 12,609
  • 8
  • 54
  • 101
  • "I have two random floats or doubles that represents an exact integer, " - erm say what? If you need an integer use an integer. floats can't represent all ints exactly (e.g. 16777217 ); doubles can't either even though they can represent more... – Mitch Wheat Nov 02 '18 at 02:08
  • @MitchWheat - I know they can represent them, but I want to know if these three operations are safe in preserving it. And yes, I agree about just using an Int, but some interpreted languages use doubles in lua of any true int type, gamemaker, actionscript, etc. – Anne Quinn Nov 02 '18 at 02:10
  • @MitchWheat - I have narrowed the definition to a 32bit integer – Anne Quinn Nov 02 '18 at 02:14
  • 2
    If you have IEEE floating-point, the source-integers and the result-integers can be represented exactly, then yes. But if `float` is single-precision (32 bits), it obviously cannot exactly represent all values of a 32bit integer let alone the results of those operations. – Deduplicator Nov 02 '18 at 02:17

2 Answers2

2

IEEE-754 single-precision float has only 24 bits of mantissa, so obviously it can't represent exactly all integers in the 32-bit range

For example if x = 16777216.0f, y = 1.0f then x + y is not equal to 16777217

OTOH IEEE-754 double-precision has 53 bits of mantissa, so it can represent exactly every 32-bit integers. That's why some languages like Javascript or Lua have only double for all the numerical values

See Are all integer values perfectly represented as doubles?

phuclv
  • 37,963
  • 15
  • 156
  • 475
0

it's important to know what operations can preserve a float's status as an integer.

IEEE-754 mandates that addition, subtraction, multiplication, divison and square root must be as precise as possible (it's a quote from IEEE-754 2008):

Each of the computational operations that return a numeric result specified by this standard shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then rounded that intermediate result, if necessary, to fit in the destination’s format.

So, if abs(result) is less than or equal to 224 (in case of float) or 253 (in case of double), then it will be precise.

Note: addition, subtraction and multiplication of integer float values will always result in an integer (no matter of the range), but it may be not precise (if the result is out of the previously mentioned range).

geza
  • 28,403
  • 6
  • 61
  • 135