Is there a tool to know whether a value has an exact binary representation as a floating point variable?

Question

My C API has a function that takes as input a double. Only 3 or 4 values are valid input, all other values are non valid input and rejected.

I'd like to check if all my valid input values can be represented exactly so that I could avoid the epsilon check to ease readability.

Is there a tool (preferably on command line) that could tell me whether a decimal value has an exact binary representation as a floating point value?

"s there a tool (preferably on command line) that could tell me whether a decimal value has an exact binary representation as a floating point value?" - that is a very unusual requirement. — Mitch Wheat, Dec 08 '11 at 09:35
Unusual, but actually I think its quite useful whenever you need to put a fractional constant into your source — Gunther Piez, Dec 08 '11 at 09:38
with "decimal value" you mean integer? and what do you mean by "exact binary representation"? — moooeeeep, Dec 08 '11 at 09:40
Which is not exactly representable a double BTW. This is a constant source of "errors" of the kind "My compiler doesn't calculate doubles correctly" which directly translates to "I am to stupid to use floating point correctly" — Gunther Piez, Dec 08 '11 at 09:47
@drhirsch: No, too ignorant. Just like you're too ignorant to spell "too stupid" correctly, but you aren't too stupid because you're capable of learning ;-p — Steve Jessop, Dec 08 '11 at 10:18
Yes, ignorant is probably better wording. But I am not a native speaker, so sometimes I may accidently choose words which have a too strong meaning. What I wanted to say is "zu doof" — Gunther Piez, Dec 08 '11 at 10:29
@Didier: Btw, could you just provide symbolic names for the 3 or 4 valid inputs, and have the caller use those? They could be `#define`, `extern const` globals, or just integers that you look up in a table. It solves this problem, and also the problem of the caller accidentally passing you a value that is very, very close to correct, but slightly off due to an inaccuracy in the calculation they used to produce it. For example, did they use `M_PI_2` or `M_PI/2`, since the two might not be equal. — Steve Jessop, Dec 08 '11 at 10:34
Khronos do exactly this; all the enumerations in, e.g., OpenGL or EGL have values which are exactly representable as floats, doubles, ints or fixed-points. They do this by using small integers (usually under 0x10000). Doubles can precisely represent integers up to about 2^53; floats about 2^23; and fixed-points about 2^16. Therefore, integers up to 2^16 can be converted to any of those formats without loss of precision. Also, hi, Steve. — David Given, Dec 08 '11 at 15:19

Alexey Frunze · Answer 1 · 2013-04-21T03:26:43.483

I've written a decimal to float/double converter for fun and made it produce an extra output flag telling whether or not the resultant floating-point value represents the input decimal string exactly.

The main idea is very simple. Whenever truncation or rounding occurs during conversion, it's remembered.

The code is not the most efficient nor does it fully validate input for all possible problems (e.g. too large exponent), but it seems to do the job for well-formed decimal strings:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <limits.h>

#define DEBUG_PRINT 0

#ifndef MIN
#define MIN(A,B) (((A) <= (B)) ? (A) : (B))
#endif

#ifndef MAX
#define MAX(A,B) (((A) >= (B)) ? (A) : (B))
#endif

static
int ParseDecimal(const char*  s,
                 int*         pSign,
                 const char** ppIntStart,
                 const char** ppIntEnd,
                 const char** ppFrcStart,
                 const char** ppFrcEnd,
                 int*         pExp)
{
  int sign = 1;
  const char* pIntStart = NULL;
  const char* pIntEnd = NULL;
  const char* pFrcStart = NULL;
  const char* pFrcEnd = NULL;
  int expSign = 1;
  const char* pExpStart = NULL;
  const char* pExpEnd = NULL;
  const char* p;
  int exp = 0;

  if (s == NULL) return -1;

  // Parse the sign and the integer part
  if (*s == '-') sign = -1, s++;
  else if (*s == '+') s++;
  while (*s && *s != '.' && *s != 'e' && *s != 'E')
  {
    if (*s < '0' || *s > '9') return -1;
    if (pIntStart == NULL) pIntStart = s;
    pIntEnd = s++;
  }

  // Parse the fractional part
  if (*s == '.')
  {
    s++;
    while (*s && *s != 'e' && *s != 'E')
    {
      if (*s < '0' || *s > '9') return -1;
      if (pFrcStart == NULL) pFrcStart = s;
      pFrcEnd = s++;
    }
  }

  if (pIntStart == NULL && pFrcStart == NULL) return -1;

  // Parse the exponent
  if (*s == 'e' || *s == 'E')
  {
    s++;

    if (*s == '-') expSign = -1, s++;
    else if (*s == '+') s++;

    if (!*s) return -1;

    while (*s)
    {
      if (*s < '0' || *s > '9') return -1;
      if (pExpStart == NULL) pExpStart = s;
      pExpEnd = s++;
    }
  }

  // Calculate the exponent
  for (p = pExpStart; p && p <= pExpEnd; p++)
    exp = exp * 10 + *p - '0';
  exp *= expSign;

  // Skip any trailing and leading zeroes
  // in the fractional and integer parts
  if (pFrcStart != NULL)
  {
    exp -= pFrcEnd + 1 - pFrcStart;
    if (pIntStart == NULL)
      while (pFrcStart < pFrcEnd && *pFrcStart == '0') pFrcStart++;
    while (pFrcEnd > pFrcStart && *pFrcEnd == '0') pFrcEnd--, exp++;
    if (*pFrcEnd == '0' && pIntStart != NULL) pFrcStart = pFrcEnd = NULL, exp++;
  }
  if (pIntStart != NULL)
  {
    if (pFrcStart == NULL)
      while (pIntEnd > pIntStart && *pIntEnd == '0') pIntEnd--, exp++;
    while (pIntStart < pIntEnd && *pIntStart == '0') pIntStart++;
    if (*pIntStart == '0' && pFrcStart != NULL)
    {
      pIntStart = pIntEnd = NULL;
      while (pFrcStart < pFrcEnd && *pFrcStart == '0') pFrcStart++;
    }
  }
  if ((pIntStart != NULL && *pIntStart == '0') ||
      (pFrcEnd != NULL && *pFrcEnd == '0'))
  {
    exp = 0;
  }

  *pSign = sign;
  *ppIntStart = pIntStart;
  *ppIntEnd   = pIntEnd;
  *ppFrcStart = pFrcStart;
  *ppFrcEnd   = pFrcEnd;
  *pExp = exp;

  return 0;
}

static
void ChainMultiplyAdd(unsigned char* pChain,
                      size_t         ChainLen,
                      unsigned char  Multiplier,
                      unsigned char  Addend)
{
  unsigned carry = Addend;

  while (ChainLen--)
  {
    carry += *pChain * Multiplier;
    *pChain++ = (unsigned char)(carry & 0xFF);
    carry >>= 8;
  }
}

static
void ChainDivide(unsigned char* pChain,
                 size_t         ChainLen,
                 unsigned char  Divisor,
                 unsigned char* pRemainder)
{
  unsigned remainder = 0;

  while (ChainLen)
  {
    remainder += pChain[ChainLen - 1];
    pChain[ChainLen - 1] = remainder / Divisor;
    remainder = (remainder % Divisor) << 8;
    ChainLen--;
  }

  if (pRemainder != NULL)
    *pRemainder = (unsigned char)(remainder >> 8);
}

int DecimalToIeee754Binary(const char* s,
                           unsigned FractionBitCnt,
                           unsigned ExponentBitCnt,
                           int* pInexact,
                           unsigned long long* pFloat)
{
  const char* pIntStart;
  const char* pIntEnd;
  const char* pFrcStart;
  const char* pFrcEnd;
  const char* p;
  int sign;
  int exp;
  int tmp;
  size_t numDecDigits;
  size_t denDecDigits;
  size_t numBinDigits;
  size_t numBytes;
  unsigned char* pNum = NULL;
  unsigned char remainder;
  int binExp = 0;
  int inexact = 0;
  int lastInexact = 0;

  if (FractionBitCnt < 3 ||
      ExponentBitCnt < 3 ||
      FractionBitCnt >= CHAR_BIT * sizeof(*pFloat) ||
      ExponentBitCnt >= CHAR_BIT * sizeof(*pFloat) ||
      FractionBitCnt + ExponentBitCnt >= CHAR_BIT * sizeof(*pFloat))
  {
    return -1;
  }

  tmp = ParseDecimal(s,
                     &sign,
                     &pIntStart,
                     &pIntEnd,
                     &pFrcStart,
                     &pFrcEnd,
                     &exp);
  if (tmp) return tmp;

  numDecDigits = ((pIntStart != NULL) ? pIntEnd + 1 - pIntStart : 0) +
                 ((pFrcStart != NULL) ? pFrcEnd + 1 - pFrcStart : 0) +
                 ((exp >= 0) ? exp : 0);
  denDecDigits = 1 + ((exp < 0) ? -exp : 0);

#if DEBUG_PRINT
  printf("%s    ", s);

  printf("%c", "- +"[1+sign]);
  for (p = pIntStart; p && p <= pIntEnd; p++) printf("%c", *p);
  for (p = pFrcStart; p && p <= pFrcEnd; p++) printf("%c", *p);
  printf(" E %d", exp);
  printf("    %zu/%zu    ", numDecDigits, denDecDigits);
//  fflush(stdout);
  printf("\n");
#endif

  // 10/3=3.3(3) > log2(10)~=3.32
  if (exp >= 0)
    numBinDigits = MAX((numDecDigits * 10 + 2) / 3,
                       FractionBitCnt + 1);
  else
    numBinDigits = MAX((numDecDigits * 10 + 2) / 3,
                       (denDecDigits * 10 + 2) / 3 + FractionBitCnt + 1 + 1);

  numBytes = (numBinDigits + 7) / 8;

  pNum = malloc(numBytes);
  if (pNum == NULL) return -2;
  memset(pNum, 0, numBytes);

  // Convert the numerator to binary
  for (p = pIntStart; p && p <= pIntEnd; p++)
    ChainMultiplyAdd(pNum, numBytes, 10, *p - '0');
  for (p = pFrcStart; p && p <= pFrcEnd; p++)
    ChainMultiplyAdd(pNum, numBytes, 10, *p - '0');
  for (tmp = exp; tmp > 0; tmp--)
    ChainMultiplyAdd(pNum, numBytes, 10, 0);

#if DEBUG_PRINT
  printf("num   : ");
  for (p = pNum + numBytes - 1; p >= (char*)pNum; p--)
    printf("%02X", (unsigned char)*p);
  printf("\n");
#endif

  // If the denominator isn't 1, divide the numerator by the denominator
  // getting at least FractionBitCnt+2 significant bits of quotient
  if (exp < 0)
  {
    binExp = -(int)(numBinDigits - (numDecDigits * 10 + 2) / 3);
    for (tmp = binExp; tmp < 0; tmp++)
      ChainMultiplyAdd(pNum, numBytes, 2, 0);
#if DEBUG_PRINT
  printf("num <<: ");
  for (p = pNum + numBytes - 1; p >= (char*)pNum; p--)
    printf("%02X", (unsigned char)*p);
  printf("\n");
#endif
    for (tmp = exp; tmp < 0; tmp++)
      ChainDivide(pNum, numBytes, 10, &remainder),
      lastInexact = inexact, inexact |= !!remainder;
  }

#if DEBUG_PRINT
  for (p = pNum + numBytes - 1; p >= (char*)pNum; p--)
    printf("%02X", (unsigned char)*p);
  printf(" * 2^%d (%c)", binExp, "ei"[inexact]);
  printf("\n");
#endif

  // Find the most significant bit and normalize the mantissa
  // by shifting it left
  for (tmp = numBytes - 1; tmp >= 0 && !pNum[tmp]; tmp--);
  if (tmp >= 0)
  {
    tmp = tmp * 8 + 7;
    while (!(pNum[tmp / 8] & (1 << tmp % 8))) tmp--;
    while (tmp < (int)FractionBitCnt)
      ChainMultiplyAdd(pNum, numBytes, 2, 0), binExp--, tmp++;
  }

  // Find the most significant bit and normalize the mantissa
  // by shifting it right
  do
  {
    remainder = 0;
    for (tmp = numBytes - 1; tmp >= 0 && !pNum[tmp]; tmp--);
    if (tmp >= 0)
    {
      tmp = tmp * 8 + 7;
      while (!(pNum[tmp / 8] & (1 << tmp % 8))) tmp--;
      while (tmp > (int)FractionBitCnt)
        ChainDivide(pNum, numBytes, 2, &remainder),
        lastInexact = inexact, inexact |= !!remainder, binExp++, tmp--;
      while (binExp < 2 - (1 << ((int)ExponentBitCnt - 1)) - (int)FractionBitCnt)
        ChainDivide(pNum, numBytes, 2, &remainder),
        lastInexact = inexact, inexact |= !!remainder, binExp++;
    }
    // Round to nearest even
    remainder &= (lastInexact | (pNum[0] & 1));
    if (remainder)
      ChainMultiplyAdd(pNum, numBytes, 1, 1);
  } while (remainder);

#if DEBUG_PRINT
  for (p = pNum + numBytes - 1; p >= (char*)pNum; p--)
    printf("%02X", (unsigned char)*p);
  printf(" * 2^%d", binExp);
  printf("\n");
#endif

  // Collect the result's mantissa
  *pFloat = 0;
  while (tmp >= 0)
  {
    *pFloat <<= 8;
    *pFloat |= pNum[tmp / 8];
    tmp -= 8;
  }

  // Collect the result's exponent
  binExp += (1 << ((int)ExponentBitCnt - 1)) - 1 + (int)FractionBitCnt;
  if (!(*pFloat & (1ull << FractionBitCnt))) binExp = 0; // Subnormal or 0
  *pFloat &= ~(1ull << FractionBitCnt);
  if (binExp >= (1 << (int)ExponentBitCnt) - 1)
    binExp = (1 << (int)ExponentBitCnt) - 1, *pFloat = 0, inexact |= 1; // Infinity
  *pFloat |= (unsigned long long)binExp << FractionBitCnt;

  // Collect the result's sign
  *pFloat |= (unsigned long long)(sign < 0) <<
             (ExponentBitCnt + FractionBitCnt);

  free(pNum);

  *pInexact = inexact;

  return 0;
}

#define TEST_ENTRY(n)  { #n, n, n##f }
#define TEST_ENTRYI(n) { #n, n, n }

struct
{
  const char* Decimal;
  double Dbl;
  float Flt;
} const testData[] =
{
  TEST_ENTRYI(0),
  TEST_ENTRYI(000),
  TEST_ENTRY(00.),
  TEST_ENTRY(.00),
  TEST_ENTRY(00.00),
  TEST_ENTRYI(1),
  TEST_ENTRY(10e-1),
  TEST_ENTRY(.1e1),
  TEST_ENTRY(.01e2),
  TEST_ENTRY(00.00100e3),
  TEST_ENTRYI(12),
  TEST_ENTRY(12.),
  TEST_ENTRYI(+12),
  TEST_ENTRYI(-12),
  TEST_ENTRY(.12),
  TEST_ENTRY(+.12),
  TEST_ENTRY(-.12),
  TEST_ENTRY(12.34),
  TEST_ENTRY(+12.34),
  TEST_ENTRY(-12.34),
  TEST_ENTRY(00.100),
  TEST_ENTRY(00100.),
  TEST_ENTRY(00100.00100),
  TEST_ENTRY(1e4),
  TEST_ENTRY(0.5),
  TEST_ENTRY(0.6),
  TEST_ENTRY(0.25),
  TEST_ENTRY(0.26),
  TEST_ENTRY(0.125),
  TEST_ENTRY(0.126),
  TEST_ENTRY(0.0625),
  TEST_ENTRY(0.0624),
  TEST_ENTRY(0.03125),
  TEST_ENTRY(0.03124),
  TEST_ENTRY(1e23),
  TEST_ENTRY(1E-23),
  TEST_ENTRY(1e+23),
  TEST_ENTRY(12.34E56),
  TEST_ENTRY(+12.34E+56),
  TEST_ENTRY(-12.34e-56),
  TEST_ENTRY(+.12E+34),
  TEST_ENTRY(-.12e-34),
  TEST_ENTRY(3.4028234e38),
  TEST_ENTRY(3.4028235e38),
  TEST_ENTRY(3.4028236e38),
  TEST_ENTRY(1.7976931348623158e308),
  TEST_ENTRY(1.7976931348623159e308),
  TEST_ENTRY(1e1000),
  TEST_ENTRY(-1.7976931348623158e308),
  TEST_ENTRY(-1.7976931348623159e308),
  TEST_ENTRY(2.2250738585072014e-308),
  TEST_ENTRY(2.2250738585072013e-308),
  TEST_ENTRY(2.2250738585072012e-308),
  TEST_ENTRY(2.2250738585072011e-308),
  TEST_ENTRY(4.9406564584124654e-324),
  TEST_ENTRY(2.4703282292062328e-324),
  TEST_ENTRY(2.4703282292062327e-324),
  TEST_ENTRY(-4.9406564584124654e-325),
  TEST_ENTRY(1e-1000),

  // Extra test data from Vern Paxson's paper
  // "A Program for Testing IEEE Decimal–Binary Conversion"
  TEST_ENTRY(5e-20                     ),
  TEST_ENTRY(67e+14                    ),
  TEST_ENTRY(985e+15                   ),
  TEST_ENTRY(7693e-42                  ),
  TEST_ENTRY(55895e-16                 ),
  TEST_ENTRY(996622e-44                ),
  TEST_ENTRY(7038531e-32               ),
  TEST_ENTRY(60419369e-46              ),
  TEST_ENTRY(702990899e-20             ),
  TEST_ENTRY(6930161142e-48            ),
  TEST_ENTRY(25933168707e+13           ),
  TEST_ENTRY(596428896559e+20          ),
  TEST_ENTRY(3e-23                     ),
  TEST_ENTRY(57e+18                    ),
  TEST_ENTRY(789e-35                   ),
  TEST_ENTRY(2539e-18                  ),
  TEST_ENTRY(76173e+28                 ),
  TEST_ENTRY(887745e-11                ),
  TEST_ENTRY(5382571e-37               ),
  TEST_ENTRY(82381273e-35              ),
  TEST_ENTRY(750486563e-38             ),
  TEST_ENTRY(3752432815e-39            ),
  TEST_ENTRY(75224575729e-45           ),
  TEST_ENTRY(459926601011e+15          ),
  TEST_ENTRY(7e-27                     ),
  TEST_ENTRY(37e-29                    ),
  TEST_ENTRY(743e-18                   ),
  TEST_ENTRY(7861e-33                  ),
  TEST_ENTRY(46073e-30                 ),
  TEST_ENTRY(774497e-34                ),
  TEST_ENTRY(8184513e-33               ),
  TEST_ENTRY(89842219e-28              ),
  TEST_ENTRY(449211095e-29             ),
  TEST_ENTRY(8128913627e-40            ),
  TEST_ENTRY(87365670181e-18           ),
  TEST_ENTRY(436828350905e-19          ),
  TEST_ENTRY(5569902441849e-49         ),
  TEST_ENTRY(60101945175297e-32        ),
  TEST_ENTRY(754205928904091e-51       ),
  TEST_ENTRY(5930988018823113e-37      ),
  TEST_ENTRY(51417459976130695e-27     ),
  TEST_ENTRY(826224659167966417e-41    ),
  TEST_ENTRY(9612793100620708287e-57   ),
  TEST_ENTRY(93219542812847969081e-39  ),
  TEST_ENTRY(544579064588249633923e-48 ),
  TEST_ENTRY(4985301935905831716201e-48),
  TEST_ENTRY(9e+26                     ),
  TEST_ENTRY(79e-8                     ),
  TEST_ENTRY(393e+26                   ),
  TEST_ENTRY(9171e-40                  ),
  TEST_ENTRY(56257e-16                 ),
  TEST_ENTRY(281285e-17                ),
  TEST_ENTRY(4691113e-43               ),
  TEST_ENTRY(29994057e-15              ),
  TEST_ENTRY(834548641e-46             ),
  TEST_ENTRY(1058695771e-47            ),
  TEST_ENTRY(87365670181e-18           ),
  TEST_ENTRY(872580695561e-36          ),
  TEST_ENTRY(6638060417081e-51         ),
  TEST_ENTRY(88473759402752e-52        ),
  TEST_ENTRY(412413848938563e-27       ),
  TEST_ENTRY(5592117679628511e-48      ),
  TEST_ENTRY(83881765194427665e-50     ),
  TEST_ENTRY(638632866154697279e-35    ),
  TEST_ENTRY(3624461315401357483e-53   ),
  TEST_ENTRY(75831386216699428651e-30  ),
  TEST_ENTRY(356645068918103229683e-42 ),
  TEST_ENTRY(7022835002724438581513e-33),
};

int main(void)
{
  int i;
  int errors = 0;

  for (i = 0; i < sizeof(testData) / sizeof(testData[0]); i++)
  {
    unsigned long long fd;
    unsigned long long ff;
    unsigned long long f = 0;
    unsigned long long d = 0;
    int inexactf = 1;
    int inexactd = 1;
    int resf;
    int resd;
    int cmpf;
    int cmpd;

    memcpy(&d, &testData[i].Dbl, MIN(sizeof(d), sizeof(testData[i].Dbl)));
    memcpy(&f, &testData[i].Flt, MIN(sizeof(f), sizeof(testData[i].Flt)));

    resd = DecimalToIeee754Binary(testData[i].Decimal, 52, 11, &inexactd, &fd);
    resf = DecimalToIeee754Binary(testData[i].Decimal, 23,  8, &inexactf, &ff);

    cmpd = !!memcmp(&d, &fd, MIN(sizeof(d), sizeof(testData[i].Dbl)));
    cmpf = !!memcmp(&f, &ff, MIN(sizeof(f), sizeof(testData[i].Flt)));

    errors += !!resd + !!resf + !!cmpd + !!cmpf;

    printf("%26s %c= 0x%016llX %c= 0x%016llX\n",
           testData[i].Decimal,
           "!="[!inexactd],
           resd ? 0xBADBADBADBADBADBULL : fd,
           "!="[!memcmp(&d, &fd, MIN(sizeof(d), sizeof(testData[i].Dbl)))],
           d);

    printf("%26s %c=         0x%08llX %c= 0x%08llX\n",
           testData[i].Decimal,
           "!="[!inexactf],
           resf ? 0xBADBADBADBADBADBULL : ff,
           "!="[!memcmp(&f, &ff, MIN(sizeof(f), sizeof(testData[i].Flt)))],
           f);
  }

  printf("errors: %d\n", errors);

  return 0;
}

Output (on x86 PC in 32-bit mode under Windows XP):

                         0 == 0x0000000000000000 == 0x0000000000000000
                         0 ==         0x00000000 == 0x00000000
                       000 == 0x0000000000000000 == 0x0000000000000000
                       000 ==         0x00000000 == 0x00000000
                       00. == 0x0000000000000000 == 0x0000000000000000
                       00. ==         0x00000000 == 0x00000000
                       .00 == 0x0000000000000000 == 0x0000000000000000
                       .00 ==         0x00000000 == 0x00000000
                     00.00 == 0x0000000000000000 == 0x0000000000000000
                     00.00 ==         0x00000000 == 0x00000000
                         1 == 0x3FF0000000000000 == 0x3FF0000000000000
                         1 ==         0x3F800000 == 0x3F800000
                     10e-1 == 0x3FF0000000000000 == 0x3FF0000000000000
                     10e-1 ==         0x3F800000 == 0x3F800000
                      .1e1 == 0x3FF0000000000000 == 0x3FF0000000000000
                      .1e1 ==         0x3F800000 == 0x3F800000
                     .01e2 == 0x3FF0000000000000 == 0x3FF0000000000000
                     .01e2 ==         0x3F800000 == 0x3F800000
                00.00100e3 == 0x3FF0000000000000 == 0x3FF0000000000000
                00.00100e3 ==         0x3F800000 == 0x3F800000
                        12 == 0x4028000000000000 == 0x4028000000000000
                        12 ==         0x41400000 == 0x41400000
                       12. == 0x4028000000000000 == 0x4028000000000000
                       12. ==         0x41400000 == 0x41400000
                       +12 == 0x4028000000000000 == 0x4028000000000000
                       +12 ==         0x41400000 == 0x41400000
                       -12 == 0xC028000000000000 == 0xC028000000000000
                       -12 ==         0xC1400000 == 0xC1400000
                       .12 != 0x3FBEB851EB851EB8 == 0x3FBEB851EB851EB8
                       .12 !=         0x3DF5C28F == 0x3DF5C28F
                      +.12 != 0x3FBEB851EB851EB8 == 0x3FBEB851EB851EB8
                      +.12 !=         0x3DF5C28F == 0x3DF5C28F
                      -.12 != 0xBFBEB851EB851EB8 == 0xBFBEB851EB851EB8
                      -.12 !=         0xBDF5C28F == 0xBDF5C28F
                     12.34 != 0x4028AE147AE147AE == 0x4028AE147AE147AE
                     12.34 !=         0x414570A4 == 0x414570A4
                    +12.34 != 0x4028AE147AE147AE == 0x4028AE147AE147AE
                    +12.34 !=         0x414570A4 == 0x414570A4
                    -12.34 != 0xC028AE147AE147AE == 0xC028AE147AE147AE
                    -12.34 !=         0xC14570A4 == 0xC14570A4
                    00.100 != 0x3FB999999999999A == 0x3FB999999999999A
                    00.100 !=         0x3DCCCCCD == 0x3DCCCCCD
                    00100. == 0x4059000000000000 == 0x4059000000000000
                    00100. ==         0x42C80000 == 0x42C80000
               00100.00100 != 0x40590010624DD2F2 == 0x40590010624DD2F2
               00100.00100 !=         0x42C80083 == 0x42C80083
                       1e4 == 0x40C3880000000000 == 0x40C3880000000000
                       1e4 ==         0x461C4000 == 0x461C4000
                       0.5 == 0x3FE0000000000000 == 0x3FE0000000000000
                       0.5 ==         0x3F000000 == 0x3F000000
                       0.6 != 0x3FE3333333333333 == 0x3FE3333333333333
                       0.6 !=         0x3F19999A == 0x3F19999A
                      0.25 == 0x3FD0000000000000 == 0x3FD0000000000000
                      0.25 ==         0x3E800000 == 0x3E800000
                      0.26 != 0x3FD0A3D70A3D70A4 == 0x3FD0A3D70A3D70A4
                      0.26 !=         0x3E851EB8 == 0x3E851EB8
                     0.125 == 0x3FC0000000000000 == 0x3FC0000000000000
                     0.125 ==         0x3E000000 == 0x3E000000
                     0.126 != 0x3FC020C49BA5E354 == 0x3FC020C49BA5E354
                     0.126 !=         0x3E010625 == 0x3E010625
                    0.0625 == 0x3FB0000000000000 == 0x3FB0000000000000
                    0.0625 ==         0x3D800000 == 0x3D800000
                    0.0624 != 0x3FAFF2E48E8A71DE == 0x3FAFF2E48E8A71DE
                    0.0624 !=         0x3D7F9724 == 0x3D7F9724
                   0.03125 == 0x3FA0000000000000 == 0x3FA0000000000000
                   0.03125 ==         0x3D000000 == 0x3D000000
                   0.03124 != 0x3F9FFD60E94EE393 == 0x3F9FFD60E94EE393
                   0.03124 !=         0x3CFFEB07 == 0x3CFFEB07
                      1e23 != 0x44B52D02C7E14AF6 == 0x44B52D02C7E14AF6
                      1e23 !=         0x65A96816 == 0x65A96816
                     1E-23 != 0x3B282DB34012B251 == 0x3B282DB34012B251
                     1E-23 !=         0x19416D9A == 0x19416D9A
                     1e+23 != 0x44B52D02C7E14AF6 == 0x44B52D02C7E14AF6
                     1e+23 !=         0x65A96816 == 0x65A96816
                  12.34E56 != 0x4BC929C7D37D0D30 == 0x4BC929C7D37D0D30
                  12.34E56 !=         0x7F800000 == 0x7F800000
                +12.34E+56 != 0x4BC929C7D37D0D30 == 0x4BC929C7D37D0D30
                +12.34E+56 !=         0x7F800000 == 0x7F800000
                -12.34e-56 != 0xB48834C13CBF331D == 0xB48834C13CBF331D
                -12.34e-56 !=         0x80000000 == 0x80000000
                  +.12E+34 != 0x46CD95108F882522 == 0x46CD95108F882522
                  +.12E+34 !=         0x766CA884 == 0x766CA884
                  -.12e-34 != 0xB8AFE6C6DCC3C5AC == 0xB8AFE6C6DCC3C5AC
                  -.12e-34 !=         0x857F3637 == 0x857F3637
              3.4028234e38 != 0x47EFFFFFD586B834 == 0x47EFFFFFD586B834
              3.4028234e38 !=         0x7F7FFFFF == 0x7F7FFFFF
              3.4028235e38 != 0x47EFFFFFE54DAFF8 == 0x47EFFFFFE54DAFF8
              3.4028235e38 !=         0x7F7FFFFF == 0x7F7FFFFF
              3.4028236e38 != 0x47EFFFFFF514A7BC == 0x47EFFFFFF514A7BC
              3.4028236e38 !=         0x7F800000 == 0x7F800000
    1.7976931348623158e308 != 0x7FEFFFFFFFFFFFFF == 0x7FEFFFFFFFFFFFFF
    1.7976931348623158e308 !=         0x7F800000 == 0x7F800000
    1.7976931348623159e308 != 0x7FF0000000000000 == 0x7FF0000000000000
    1.7976931348623159e308 !=         0x7F800000 == 0x7F800000
                    1e1000 != 0x7FF0000000000000 == 0x7FF0000000000000
                    1e1000 !=         0x7F800000 == 0x7F800000
   -1.7976931348623158e308 != 0xFFEFFFFFFFFFFFFF == 0xFFEFFFFFFFFFFFFF
   -1.7976931348623158e308 !=         0xFF800000 == 0xFF800000
   -1.7976931348623159e308 != 0xFFF0000000000000 == 0xFFF0000000000000
   -1.7976931348623159e308 !=         0xFF800000 == 0xFF800000
   2.2250738585072014e-308 != 0x0010000000000000 == 0x0010000000000000
   2.2250738585072014e-308 !=         0x00000000 == 0x00000000
   2.2250738585072013e-308 != 0x0010000000000000 == 0x0010000000000000
   2.2250738585072013e-308 !=         0x00000000 == 0x00000000
   2.2250738585072012e-308 != 0x0010000000000000 == 0x0010000000000000
   2.2250738585072012e-308 !=         0x00000000 == 0x00000000
   2.2250738585072011e-308 != 0x000FFFFFFFFFFFFF == 0x000FFFFFFFFFFFFF
   2.2250738585072011e-308 !=         0x00000000 == 0x00000000
   4.9406564584124654e-324 != 0x0000000000000001 == 0x0000000000000001
   4.9406564584124654e-324 !=         0x00000000 == 0x00000000
   2.4703282292062328e-324 != 0x0000000000000001 == 0x0000000000000001
   2.4703282292062328e-324 !=         0x00000000 == 0x00000000
   2.4703282292062327e-324 != 0x0000000000000000 == 0x0000000000000000
   2.4703282292062327e-324 !=         0x00000000 == 0x00000000
  -4.9406564584124654e-325 != 0x8000000000000000 == 0x8000000000000000
  -4.9406564584124654e-325 !=         0x80000000 == 0x80000000
                   1e-1000 != 0x0000000000000000 == 0x0000000000000000
                   1e-1000 !=         0x00000000 == 0x00000000
                     5e-20 != 0x3BED83C94FB6D2AC == 0x3BED83C94FB6D2AC
                     5e-20 !=         0x1F6C1E4A == 0x1F6C1E4A
                    67e+14 == 0x4337CD9D4FFEC000 == 0x4337CD9D4FFEC000
                    67e+14 !=         0x59BE6CEA == 0x59BE6CEA
                   985e+15 == 0x43AB56D88FFF8500 == 0x43AB56D88FFF8500
                   985e+15 !=         0x5D5AB6C4 == 0x5D5AB6C4
                  7693e-42 != 0x3804F13D0FFFE4A1 == 0x3804F13D0FFFE4A1
                  7693e-42 !=         0x0053C4F4 == 0x0053C4F4
                 55895e-16 != 0x3D989537AFFFFFE1 == 0x3D989537AFFFFFE1
                 55895e-16 !=         0x2CC4A9BD == 0x2CC4A9BD
                996622e-44 != 0x380B21710FFFFFFB == 0x380B21710FFFFFFB
                996622e-44 !=         0x006C85C4 == 0x006C85C4
               7038531e-32 != 0x3AB5C87FB0000000 == 0x3AB5C87FB0000000
               7038531e-32 !=         0x15AE43FD == 0x15AE43FD
              60419369e-46 != 0x3800729D90000000 == 0x3800729D90000000
              60419369e-46 !=         0x0041CA76 == 0x0041CA76
             702990899e-20 != 0x3D9EEAF950000000 == 0x3D9EEAF950000000
             702990899e-20 !=         0x2CF757CA == 0x2CF757CA
            6930161142e-48 != 0x3802DD9E10000000 == 0x3802DD9E10000000
            6930161142e-48 !=         0x004B7678 == 0x004B7678
           25933168707e+13 != 0x44CB753310000000 == 0x44CB753310000000
           25933168707e+13 !=         0x665BA998 == 0x665BA998
          596428896559e+20 != 0x4687866490000000 == 0x4687866490000000
          596428896559e+20 !=         0x743C3324 == 0x743C3324
                     3e-23 != 0x3B422246700E05BD == 0x3B422246700E05BD
                     3e-23 !=         0x1A111234 == 0x1A111234
                    57e+18 == 0x4408B84570022A20 == 0x4408B84570022A20
                    57e+18 !=         0x6045C22C == 0x6045C22C
                   789e-35 != 0x39447BCDF000340C == 0x39447BCDF000340C
                   789e-35 !=         0x0A23DE70 == 0x0A23DE70
...
errors: 0

The first == or != on each line of the output tells whether or not the obtained float/double represents the decimal input exactly.

The second == or != tells whether or not the calculated float/double matches the one generated by the compiler. The first hex number is from DecimalToIeee754Binary() and the second is from the compiler.

UPD: The code was compiled with gcc 4.6.2 and Open Watcom C/C++ 1.9.

score 6 · Accepted Answer · answered Dec 10 '11 at 18:25

Here's a Python snippet that does exactly what you ask for; it needs Python 2.7 or Python 3.x. (Earlier versions of Python are less careful with floating-point conversions.)

import decimal, sys
input = sys.argv[1]
if decimal.Decimal(input) == float(input):
    print("Exactly representable")
else:
    print("Not exactly representable")

Usage: after saving the script under the name 'exactly_representable.py',

mdickinson$ python exactly_representable.py 1.25
Exactly representable
mdickinson$ python exactly_representable.py 0.1
Not exactly representable
mdickinson$ python exactly_representable.py 1e22
Exactly representable
mdickinson$ python exactly_representable.py 1e23
Not exactly representable

Paul R · Answer 3 · 2011-12-08T10:43:12.543

2

It should be pretty simple to write such a tool:

input value as string
convert to double
convert back to string
compare with input

Care would need to be taken to ensure that no rounding takes place in the conversions to/from double.

edited Dec 08 '11 at 10:43

answered Dec 08 '11 at 09:41

Paul R

208,748
37
389
560

You are missing the rounding, which my implicitly take place in the conversion to string – Gunther Piez Dec 08 '11 at 09:43
@drhirsch: why would there be rounding if the number is exactly representable ? – Paul R Dec 08 '11 at 09:50
@PaulR: The case of interest is when the number is _not_ exactly representable. In this case, the conversion to string may still "round" the number to the "closest sensible" decimal number. For example, `float f=0.1; printf("%g", f);` will output "0.1" (which, in most cases, is what people actually want). – Martin B Dec 08 '11 at 09:54
1

Consider 1/2^50, which is surely representable as double (which has 53 bits mantissa), but which read as 0,00000000000000088817841970012523233890533447265625 in decimal, which will be truncated to 19 or 20 significant digits. Or worse. And if the number is not representable rounding will take place too, just consider 0.1 which may be read in, converted to some approximation to 0.1, and printed as 0.1 because of rounding. – Gunther Piez Dec 08 '11 at 09:59
`float f=0.1; printf("%.*G\n", f);` will print `0.100000001490116119384765625` , so it surely can be used for checking. – Daniel Fekete Dec 08 '11 at 10:15
OK - got it - thanks @drhirsch and others. Not sure whether to leave this answer for future reference or delete it due to down-votes ? – Paul R Dec 08 '11 at 10:17
@Daniel: I didn't know this _printf_ format option. So this answer is actually correct. – Gunther Piez Dec 08 '11 at 10:23
I don't think this should necessarily be deleted, but it requires that "convert to double" actually chooses the closest possible value, and "convert back to string" is done exactly, which I think requires 54 significant figures or so. It's probably not obvious (except to Daniel) how to write the latter, and I suspect that the former isn't guaranteed by things like `strtod` although I haven't checked. So it needs some non-trivial detail to make it work. – Steve Jessop Dec 08 '11 at 10:23
Just write in your answer that need to make sure to get no rounding – Gunther Piez Dec 08 '11 at 10:24
Ah, regarding `strtod`, it's supposed to interpret the input the same was as the compiler interprets a floating-point constant. So assuming the compiler and the library do correspond, it should tell you whether that value *as a constant in the source* is exactly represented. Which is probably what you want to know: if the number you're trying to type has an exact representation, but the compiler doesn't find it, then that's a "negative". One option is to paste the string value into a C++ source file, compile and run it, to be certain you're testing the compiler and not the library. – Steve Jessop Dec 08 '11 at 10:32
1

@Daniel: You omitted the third parameter, which should give the precision. I just wondered while a simple printf consumes 4G of memory and takes 10 seconds to execute, when I realized that some random (obviously huge) value was pulled from the stack. – Gunther Piez Dec 08 '11 at 11:00
@drhirsch: Good catch. Instead of passing the precision as a parameter, it could, of course be put directly into the format string. So immediately the question becomes: What value should be used? It seems that the original code was just fortuitously pulling a "large enough" value from the stack. (BTW `%G` differs from `%g` only in that the exponential notation, if used, uses a capital instead of lower-case E.) – Martin B Dec 08 '11 at 14:36

score 2 · Answer 4 · answered Dec 08 '11 at 09:43

While this isn't exactly what you need, it's kind of close:

http://www.h-schmidt.net/FloatApplet/IEEE754.html

You'll need a bit of interpretation to figure out if your values can be represented exactly in binary floating point, but since you've only got three or four values, that should be OK.

As an example of how you might use this, enter "0.1" in the "decimal representation" field.

If we examine the binary representation, we see that the mantissa appears to be a repeating sequence, which is already a sign that we can't represent the value exactly:

0 0111101 110011001100110011001101

(For better readability, I've put spaces here between the sign, exponent, and mantissa.)

Another indication is the "with double precision" field. What it does is to extend the single-precision binary floating point number to double precision by extending the mantissa with zeros, then converting back to decimal. If the number can be represented exactly, we would expect to see the number we originally input; in this case, though, we see 0.10000000149011612. This is an additional indication that 0.1 cannot be represented exactly using binary floating point.

score 1 · Answer 5 · answered Dec 08 '11 at 14:11

My arbitrary-precision decimal to binary converter might be of help. There are two cases to consider:

1) Integer value: just check that there are no 1 bits after the 53rd bit position (you'll have to count by hand)

2) Fractional value or mixed number: if the 'Num Digits' for the fractional part has the infinity symbol (∞), the value is not exact; if that field is not infinity, then the number's exact as long as both num digits fields add up to 53 or less.

Excellent -- this looks like exactly what the OP needs. – Martin B Dec 08 '11 at 14:39 — Martin B, Dec 08 '11 at 14:39

Is there a tool to know whether a value has an exact binary representation as a floating point variable?

5 Answers5

Linked