Representing integers in doubles

Question

Can a double (of a given number of bytes, with a reasonable mantissa/exponent balance) always fully precisely hold the range of an unsigned integer of half that number of bytes?

E.g. can an eight byte double fully precisely hold the range of numbers of a four byte unsigned int?

What this will boil down to is if a two byte float can hold the range of a one byte unsigned int.

A one byte unsigned int will of course be 0 -> 255.

paxdiablo · Accepted Answer · 2019-03-16T00:34:36.413

44

An IEEE754 64-bit double can represent any 32-bit integer, simply because it has 53-odd^(a) bits available for precision and the 32-bit integer only needs, well, 32 :-)

It would be plausible for a (non IEEE754 double precision) 64-bit floating point number to have less than 32 bits of precision. That would allow truly huge numbers (due to the exponent) but at the cost of precision.

The bottom line is that, provided there are more bits of precision in the mantissa of the floating point number than there are in the integer (and enough bits in the exponent to scale it), then it can be represented without loss of precision.

^(a) Technically, the 53rd bit of precision is an implied 1 at the start of the sequence so the amount of "variablity" may only be 52 bits. Whether it's 52 or 53, it's still enough bits to represent every 32-bit integer.

edited Mar 16 '19 at 00:34

answered Apr 17 '09 at 06:08

paxdiablo

854,327
234
1,573
1,953

Looks like it actually has 52 bits of precision, right? Since only 52 bits exist in the "fraction" section of the structure: https://en.wikipedia.org/wiki/Double-precision_floating-point_format – B T Mar 15 '19 at 22:54
1

@BT, that's why I said "53-odd", meaning *about* 53, I'll clarify. There are only 52 bits stored in the encoding but there's an implicit 1-bit at the start. Whether you consider it 52 or 53 depends on your viewpoint but the bottom line is that there's enough bits to represent any 32-bit value quite easily. – paxdiablo Mar 16 '19 at 00:32

1800 INFORMATION · Answer 2 · 2009-04-18T00:32:40.843

7

Yes. A float (or double) is guaranteed to exactly represent any integer that does not need to be truncated. For a double, there is 53 bits of precision, so that is more than enough to exactly represent any 32 bit integer, and a tiny (statistically speaking) proportion of 64 bit ones too.

edited Apr 18 '09 at 00:32

answered Apr 17 '09 at 06:09

1800 INFORMATION

131,367
29
160
239

score 2 · Answer 3 · answered Apr 17 '09 at 06:09

2

Exactly what the range is that you can represent exactly depends on a lot of factors in your implementation, but you can lower-bound it by saying that, if the exponent field is set to 0, you can exactly represent integers up to the width of your mantissa field (assuming a sign bit). For IEEE 754 double-precision, this means you can represent 52-bit numbers exactly. In general, your mantissa will be over half the width of the overall structure.

answered Apr 17 '09 at 06:09

Matt J

43,589
7
49
57

Wouldn't one be able to represent a 53-bit (not 52-bit) integer exactly, with a 52-bit mantissa, because of the implicit bit to the left of the binary point? – Apriori Feb 24 '17 at 02:39

score -5 · Answer 4 · answered Apr 17 '09 at 06:08

-5

I wouldn't use the words "fully precisely" when talking about floating-point numbers. But yes, a double can represent a 32-bit integer.

I do not know which other combinations of floats and ints that this is also true for.

Practically speaking, you don't want to bother using floating point above what your machine supports, so just switch to rational arithmetic with bignums. That way, you're guaranteed precision.

answered Apr 17 '09 at 06:08

Frank Krueger

69,552
46
163
208

2

A floating-point number can precisely represent some numbers, and among those are all integers that fit within its precision. Once you start doing division, or multiplication that could cause overflow, you've probably lost precision. Nor does everybody have a handy system for bignums and/or rational numbers. – David Thornley Apr 17 '09 at 17:08

Representing integers in doubles

4 Answers4

Linked

Related