3

I would like to know the easiest, most portable and generally considered best practice to achieve this, that works for any number. I also would like the string associated with the number to be in decimal representation, without scientific notation if possible.

Ernaldo
  • 154
  • 7
  • 1
    Scientific notation/format with 52 decimal digits will work. Simply for storage you could also use the hexadecimal encoding. – Lutz Lehmann May 17 '23 at 10:43
  • 1
    Without scientific notation consider the extreme values provided [here](https://stackoverflow.com/a/48650578/1312382) – didn't count, but you'll potentially need pretty large arrays at least for double... – Aconcagua May 17 '23 at 10:55
  • *'most portable way'* – then avoid floating point entirely, if meaningfully possible, instead fall back to fixed-point arithmetics (e. g. using millimeters instead of meters, microseconds instead of seconds, cents instead of euros/dollars or tenth of – depending on your precision requriements...). That way you can work with simple integer types and can avoid all that hassle around floating point. – Aconcagua May 17 '23 at 11:10
  • 2
    To do that, you will need to print the float/double without any loss of precision. A way to do that is found here: https://codereview.stackexchange.com/questions/212490/function-to-print-a-double-exactly. – nielsen May 17 '23 at 11:10
  • See [Printf width specifier to maintain precision of floating-point value](https://stackoverflow.com/q/16839658/2410359) – chux - Reinstate Monica May 17 '23 at 14:20
  • @nielsen I might be missing something obvious, but I don't get why that post says that a double can store up to 17 significant digits and shows a valid double number being hundreds of digit long – Ernaldo May 17 '23 at 20:52
  • @Ernaldo That is the cost of dropping scientific notation. IEEE 754 `double` can represent numbers of a magnitude of around 2^1023 ~ 10^308, i.e. more than 300 digits, but only with around 16 significant decimal digits. – nielsen May 17 '23 at 21:17
  • @Ernaldo : This is a frequent misconception. With 53 mantissa bits you can store about 15.8 decimal digits. A double float stored as a string with 17 decimal digits can be restored to the same binary representation of a double. – Lutz Lehmann May 29 '23 at 12:54

5 Answers5

10

There are two questions:

  1. What format do you need, and
  2. How many significant digits do you need?

You said you wanted to avoid scientific notation if possible, and that's fine, but printing numbers like 0.00000000000000000123 or 12300000000000000000 gets kind of unreasonable, so you might want to switch to scientific notation for really big or really small numbers.

As it happens, there's a printf format that does exactly that: %g. It acts like %f if it can, but switches to %e if it has to.

And then there's the question of the number of digits. You need enough digits to preserve the internal precision of the float or double value. To make a long story short, the number of digits you want is the predefined constant FLT_DECIMAL_DIG or DBL_DECIMAL_DIG.

So, putting this all together, you can convert a float to a string like this:

sprintf(str, "%.*g", FLT_DECIMAL_DIG, f);

The technique for a double is perfectly analogous:

sprintf(str, "%.*g", DBL_DECIMAL_DIG, d);

In both cases, we use an indirect technique to tell %g how many significant digits we want. We could have used %g to let it pick, or we could have used something like %.10g to request 10 significant digits, but here we use %.*g, where the * says to use a passed-in argument to specify the number of significant digits. This lets us plug in the exact value FLT_DECIMAL_DIG or DBL_DECIMAL_DIG from <float.h>.

(There's also the question of how big a string you might need. More on this below.)

And then you can convert back from a string to a float or double using atof, strtod, or sscanf:

f = atof(str);
d = strtod(str, &str2);
sscanf(str, "%g", &f);
sscanf(str, "%lg", &d);

(By the way, scanf and friends don't really care about the format so much — you could use %e, %f, or %g, and they'd all work exactly the same.)

Here is a demonstration program tying all of this together:

#include <stdio.h>
#include <stdlib.h>
#include <float.h>

int main()
{
    double d1, d2;
    char str[DBL_DECIMAL_DIG + 10];

    while(1) {
        printf("Enter a floating-point number: ");
        fflush(stdout);

        if(scanf("%lf", &d1) != 1) {
            printf("okay, we're done\n");
            break;
        }

        printf("you entered: %g\n", d1);

        snprintf(str, sizeof(str), "%.*g", DBL_DECIMAL_DIG, d1);

        printf("converted to string: %s\n", str);

        d2 = strtod(str, NULL);

        printf("converted back to double: %g\n", d2);

        if(d1 != d2)
            printf("whoops, they don't match!\n");

        printf("\n");
    }
}

This program prompts for a double value d1, converts it to a string, converts it back to a double d2, and checks to make sure the values match. There are several things to note:

  1. The code picks a size char str[DBL_DECIMAL_DIG + 10] for the converted string. That should always be enough for the digits, a sign, an exponent, and the terminating '\0'.
  2. The code uses the (highly recommended) alternative function snprintf instead of sprintf, so that the destination buffer size can be passed in, to make sure it doesn't overflow if by some mischance it's not big enough, after all.
  3. You will hear it said that you should never compare floating-point numbers for exact equality, but this is a case where we want to! If after going around the barn, d1 is not exactly equal to d2, something has gone wrong.
  4. Although this code checks to make sure that d1 == d2, it quietly glosses over the fact that d1 might not have been exactly equal to the number you entered! Most real numbers (and most decimal fractions) cannot be represented exactly as a finite-precision float or double value. If you enter a seemingly "simple" fraction like 0.1 or 123.456, d1 will not have exactly that value. d1 will be a very close approximation — and then, assuming everything else works correctly, d2 will end up containing exactly the same very close approximation. To see what's really going on here, you can increase the precision printed by the "you entered" and "converted back to double" lines. See also Is floating-point math broken?
  5. The number of significant digits we care about here — the precision we give to %g when we say %.10g or %.*g — is a number of significant digits. It is not just a count of places past the decimal. For example, the numbers 1234000, 12.34, 0.1234, and 0.00001234 all have four significant digits.

Up above I said, "To make a long story short, the number of digits you want is the predefined constant FLT_DECIMAL_DIG or DBL_DECIMAL_DIG." These constants are literally the minimum number of significant digits required to take an internal floating-point value, convert it to a decimal (string) representation, convert it back to an internal floating-point value, and get exactly the same value back. That's obviously precisely what we want here. There's another, seemingly similar, pair of constants, FLT_DIG and DBL_DIG which give the minimum number of digits you're guaranteed to preserve if you convert from an external, decimal (string) representation, to an internal floating-point value, and then back to decimal. For typical IEEE-754 implementations, FLT_DIG/DBL_DIG are 6 and 15, while FLT_DECIMAL_DIG/DBL_DECIMAL_DIG are 9 and 17. See this SO answer for more on this.

FLT_DECIMAL_DIG and DBL_DECIMAL_DIG are the minimum number of digits necessary to guarantee a round-trip binary-to-decimal-to-binary conversion, but they are not necessarily enough to show precisely what the actual, internal, binary value is. For those you might need as many decimal digits as there are binary bits in the significand. For example, if we start with the decimal number 123.456, and convert it to float, we get something like 123.45600128... . If we print it with FLT_DECIMAL_DIG or 9 significant digits, we get 123.456001, and that converts back to 123.45600128..., so we've succeeded. But the actual internal value is 7b.74bc8 in base 16, or 1111011.01110100101111001 in binary, with 24 significant bits. The actual, full-precision decimal conversion of those numbers is 123.45600128173828125.


Addendum: It must be noted that accurately transmitting floating-point values as decimal strings in this way does absolutely demand:

  1. A well-constructed floating-point-to-decimal-string converter (i.e. sprintf %g). When converting N bits to M digits, they must always be M properly rounded digits.
  2. Sufficient digits (FLT_DECIMAL_DIG or DBL_DECIMAL_DIG, as discussed above).
  3. A well-constructed decimal-string-to-floating-point converter (e.g. strtod()). When converting N digits to M bits, they must always be M properly rounded bits.

The IEEE-754 standard does require properties (1) and (3). But implementations not conforming to IEEE-754 might not do so well. (It turns out that property (1), in particular, is remarkably difficult to achieve, although techniques for doing so are now well understood.)


Addendum 2: I have performed empirical tests using a modification of the above program, looping over many values, not just individual ones scanf'ed from the user. In this "regression test" version, I have replaced the test

if(d1 != d2)
    printf("whoops, they don't match!\n");

with

if(d1 != d2 && (!isnan(d1) || !(isnan(d1) && isnan(d2))))
    printf("whoops, they don't match!\n");

(That is, when the numbers don't match, it's an error only if one of them is not a NaN.)

Anyway, I have tested all 4,294,967,296 values of type float. I have tested 100,000,000,000 randomly-selected values of type double (which is, to be fair, a tiny fraction of them). Not once (except for deliberately-induced errors, to test the tests) have I seen it print "whoops, they don't match!".

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • 1
    Array needs to be large enough, too, to hold the result – how much would that be? I'd assume the constant plus one for the period or e/E plus 3/4 for the exponent (float/double respectively) plus two for two signs (value + exponent) plus one for the null terminator? Though 32 should be fine for any of (counted 25 for double according to [cppreference](https://en.cppreference.com/w/c/types/limits) and IEEE754 provided, but 32 is a power of two...). – Aconcagua May 17 '23 at 11:28
  • 2
    @AndreasWenzel Or just do the call again with a large-enough buffer. If `snprintf()` failed, it returned how big the buffer needs to be. – Andrew Henle May 17 '23 at 11:38
  • @Aconcagua Yes, buffer size is an important consideration. Covered that now. – Steve Summit May 17 '23 at 12:01
  • `DBL_DECIMAL_DIG + 10` should be sufficient. 10 - 1 (null terminator) -1 (minus sign) - 1 (decimal point) - 1 (`e`) - 1 (exponent sign) leaves 5 characters for the exponent digits, and a IEEE754 64-bit double only needs 3 characters for the exponent digits. In general, the maximum number of exponent digits required to print a `double` is `ceil(log10(DBL_DECIMAL_DIG - DBL_MIN_10_EXP))` (allowing for subnormal values). That will be at least 2 due to minimum requirements of the C standard, which means it is also sufficient when the `%g` selects `%f` format. – Ian Abbott May 17 '23 at 13:44
  • Could add `"%a"` to the "you could use %e, %f, or %g, and they'd all work exactly the same" list. – chux - Reinstate Monica May 17 '23 at 14:24
  • Corner, `if(d1 != d2)` fails to detect a sign change with negative zero. @DevSolar [idea](https://stackoverflow.com/a/76271885/2410359) to check _binary representation_ is useful for testing zero. NAN compare is another issue. – chux - Reinstate Monica May 17 '23 at 14:34
3

Every printf() / scanf() (/ strtod()) implementation that is not utterly outdated (and, thus, bugged) should be able to make the round-trip without loss of precision. It is important, though, that you compare the floating point representation pre and post roundtrip, not what is printed as a string. An implementation is perfectly allowed to print an approximation of the binary value, as long as it unambiguously identifies that binary value. (Note that there are many more possible decimal representations than binary ones.)

If you are interested in the details of how this is done, the algorithm is called Dragon 4. A nice introduction on the subject is available here.

If you don't care for readability of the string too much, go for the %a conversion specifier. This prints / reads the float's mantissa as hexadecimal (with a decimal exponent). This avoids the binary / decimal conversion altogether. You also do not need to worry about specifying how many digits of precision should be printed, as the default is to print the precise value.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • A _binary representation_ compare has challenges/features too. 1) When a FP number has multiple encodings 2) FP has padding (e.g. some `long double`) which need not compare the same. 3) Round-tripping NANs with a payload is it own dilemma. – chux - Reinstate Monica May 17 '23 at 14:39
  • @chux-ReinstateMonica With "binary representation" compare I meant comparing the the pre-roundtrip `double` / `long double` with the post-roundtrip, not bit by bit but by `==`. That should ignore padding. Attempting to round-trip NaNs (which are not even equal to themselves) is pretty pointless. And I don't know of any FP that would have multiple valid (normalized) encodings? – DevSolar May 17 '23 at 15:07
1

I would like to know the easiest, most portable and generally considered best practice to achieve this, that works for any number. I also would like the string associated with the number to be in decimal representation, without scientific notation if possible.

... works for any number

This is challenging to do well in general. Unusual considerations that need assessment include:

Without scientific notation

Some quality standard libraries will perform high precision text conversions without insignificant loss.

double x = -DBL_TRUE_MIN;

#define PRECISION_NEED (DBL_DECIMAL_DIG - DBL_MIN_10_EXP - 1)
//            sign 1   .     fraction       \0
#define BUF_N (1 + 1 + 1 + PRECISION_NEED + 1)
char buf[BUF_N];
sprintf(buf, "%.f", PRECISION_NEED, x);

if (atof(buf) == x) ...

Or you can code it yourself, yet that is not simple.

Best practice

Use sprintf(large_enough_buffer, "%.g", DBL_DECIMAL_DIG, x) as suggested by many as the first step.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

Converting floating point numbers to decimal is not an exact process (unless you use very long strings - see comments), nor is doing the converse. If it's important that the floating point numbers read back are exactly the same, bit for bit, then you need to preserve the binary representation, possibly as a hex string as shown below. This preserves non-numerical values like NAN and +-INF. The hex string can safely be written to memory or a file.

If you need it to be human readable, then you could invent your own string format which uses both, such as by prepending the decimal string with the hex representation for example. Then when the number is converted back to a float, it will use the hex value, not the decimal value and so will have exactly the same value as the original. The hex string only requires a fixed 8 characters so is not so expensive. As others have pointed out it can be non-obvious to predict the size of the buffer needed to printf a float or double, especially if you want no loss of precision. See other's comments and answers for options and hazzards on how to print a human readable representation.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <math.h>

/********************************************************************************************
// Floating Points As Hex Strings
// ==============================
// Author: Simon Goater May 2023.
//
// The binary representation of floats must be same for source and destination floats.
// If the endianess of source and destination differ, the hex characters must be 
// permuted accordingly.
*/
typedef union {
  float f;
  double d;
  long double ld;
  unsigned char c[16];
} fpchar_t;

const unsigned char hexchar[16] = {0x30, 0x31, 0x32, 0x33, 
    0x34, 0x35, 0x36, 0x37, 
    0x38, 0x39, 0x41, 0x42, 
    0x43, 0x44, 0x45, 0x46};
const unsigned char binchar[23] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 
  0, 0, 0, 0, 0, 10, 11, 12, 13, 14, 15};
    
void fptostring(void* f, unsigned char* string, uint8_t sizeoffp) {
  fpchar_t floatstring;  
  memcpy(&floatstring.c, f, sizeoffp);
  int i, stringix;
  stringix = 0;
  unsigned char thischar;
  for (i=0; i<sizeoffp; i++) {
    thischar = floatstring.c[i];
    string[stringix] = hexchar[thischar >> 4];
    stringix++;
    string[stringix] = hexchar[thischar & 0xf];
    stringix++;
  }
}

void stringtofp(void* f, unsigned char* string, uint8_t sizeoffp) {
  fpchar_t floatstring;
  int i, stringix;
  stringix = 0;
  for (i=0; i<sizeoffp; i++) {
    floatstring.c[i] = binchar[(string[stringix] - 0x30) % 23] << 4;
    stringix++;
    floatstring.c[i] += binchar[(string[stringix] - 0x30) % 23];
    stringix++;
  }
  memcpy(f, &floatstring.c, sizeoffp);
}

_Bool isfpstring(void* f, unsigned char* string, uint8_t sizeoffp) {
  // Validates the floatstring and if ok, copies value to f.
  int i;
  for (i=0; i<2*sizeoffp; i++) {
    if (string[i] < 0x30) return false;
    if (string[i] > 0x46) return false;
    if ((string[i] > 0x39) && (string[i] < 0x41)) return false;
  }
  stringtofp(f, string, sizeoffp);
  return true;
}

/********************************************************************************************
// Floating Points As Hex Strings - END
// ====================================
*/

int main(int argc, char **argv)
{
  //float f = 1.23f;
  //double f = 1.23;
  long double f = 1.23;
  if (argc > 1) f = atof(argv[1]);
  unsigned char floatstring[33] = {0};
  //printf("fpval = %.32f\n", f);
  printf("fpval = %.32Lf\n", f);
  fptostring((void*)&f, (unsigned char*)floatstring, sizeof(f));
  printf("floathex = %s\n", floatstring);
  f = 1.23f;
  //floatstring[0] = 'a';
  if (isfpstring((void*)&f, (unsigned char*)floatstring, sizeof(f))) {
    //printf("fpval = %.32f\n", f);
    printf("fpval = %.32Lf\n", f);
  } else {
    printf("Error converting floating point from hex.\n");
  }
  exit(0);
}
Simon Goater
  • 759
  • 1
  • 1
  • 7
  • "Converting floating point numbers to decimal is not an exact process ..." --> Conversion to decimal text can be done exactly for all finite FP. With a good `printf()` implementation using high precision or with you own code: [Function to print a double - exactly](https://codereview.stackexchange.com/q/212490/29485). – chux - Reinstate Monica May 17 '23 at 14:43
  • I saw your post, and while interesting, I didn't see anything that convinced me that the decimal digit representation must always terminate. Where is the proof of that? Also, using potentially hundreds of characters makes it an unattractive option in my opinion, but you're right that I didn't prove my statement either and it may strictly be false. – Simon Goater May 17 '23 at 14:57
  • 2
    "Where is the proof of that?" each power-of-2 is _exact_ in decimal: ... 4, 2, 1, 0.5, 0.25, 0.125 ... All finite FP are sums of these exact values. – chux - Reinstate Monica May 17 '23 at 15:12
  • 1
    Ah yes, of course. So you can have up to n decimal places if the smallest number the FP can hold is 2^(-n). Do you have a function that can read your decimal representations back in? – Simon Goater May 17 '23 at 16:13
  • " function that can read your decimal representations back in" --> `atof()`, `strtod()` will suffice. – chux - Reinstate Monica May 17 '23 at 17:48
  • Even [IEE-754](https://en.wikipedia.org/wiki/IEEE_754) does not require that all leading decimal digits are relevant. IIRC, after `DBL_DECIMAL_DIG + 3` significant digits, the rest can be assumed as 0, even if that would result in a different `double`. – chux - Reinstate Monica May 17 '23 at 17:51
0

As a general rule, this can be impossible to achieve, as in the decoding process, two different implementations can result in different floating point values. The reason for this is that, expressing the same number as a decimal ASCII number and internally as a binary representation of the number is not possible as a biyective application. Some times a decimal floating point number (e.g. 0.1) has no finite representation as a binary number (0.1 decimal converts into 0.00011001100110011001100110011001100... binary) and cannot be represented as a finite bit sequence (like when we divide 1.0 by 3.0, we get the infinite sequence 0.333333333333...)

converting a finite, binary number to decimal is always posible... every finite floating point number (one that has no infinite number representation) always results in a finite (although it can be very large) string. This means that there are more decimal string representations of finite decimal numbers than any finite binary representation. based on this, always we'll have a many to one application that results in some decimal finite representation numbers being mapped to the same binary image.

This could be handled by the by the implementation, if we consider the fact that the correspondence from binary to decimal is inyective, and always results in a binary being able to be converted, we can build an inverse that maps that found representation into the original one (we are dealing with finite sets, so, at least, we can do it case by case) For example, the representation that maps all the numbers closest to the mapped number to be converted back to that same number. But there's another drawback, that impedes to build the mapping. The mapping of arbitrary, finite lentgh, binary string, always maps into a mapping, finite length, decimmal string... but the amount of digits necesary to store a full binary digit with full decimal precision requires around one full dedimal significative digit per binary digit in the binary representation, so while

0.1(bin) --> 0.5(dec)  (one digit each)

while

0.0001(bin) --> 0.0625(dec) (four digits after the decimal point)
1.0 * -2^32 -->  0.00000000023283064365386962890625 (23 significative digits after the decimal point)

and growing. Maintaining a bounded computation (in both, decimal and binary number systems) and rounding can make that some number rounds to the nearest decimal point (using decimal rounding), but when reading back the number to the computer, the closest (this time using binaray rounding or the closest approach described above) be the next or the previous number to the original one, and make a difference between the original number and the one retrieved after being saved.

But... you can consider saving a number in ascii binary form.

This way, you will warrant that the stored number will be exactly the same as the original one (why, because in both processes the rounding is made in the same numbering base, making biyective the correspondence). It should be easy to make such a conversion, so you will get a portable and exact serialization of floating point binary numbers. This can be done in a bounded and exact way, so you will never incurr in rounding errors, and will warant that your data is succesfully saved and later restored.

In today's architectures, the standard for internal binary floating point representation is IEEE-754 is widely used. So a simple mapping like taking the byte representations in hexadecimal starting from the sign holding byte, to the LSB bit of the significand is a good and efficient starting point. Another good convertion is to use base64 encoding of the binary representation in big endian IEEE-754 (as described above) that allows you to encode in an architecture independent any double number (including NaNs and Infinites) into 11 ASCII characters, or a float into 5 ASCII characters.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
  • For high-quality float-to-string and string-to-float implementations — which IMO ought to be the norm today — this certainly *is* possible. Although using hexadecimal string representations is certainly attractive, it is not necessary. As long as you preserve `FLT_DECIMAL_DIG` or `DBL_DECIMAL_DIG` digits of precision, decimal representations really ought to work also. (Alas, this might rule out Microsoft.) – Steve Summit May 29 '23 at 11:56
  • Not Steve, the only way is to have a biyective map. If the map is not biyective you have a many to one map, that cannot be reversed. The approach you propose requires a bounded (but not an easy bounded) amount of memory and so, ends being not effective. If you want to be effective, just don't convert binary to decimal digits, but store them in binary as they are stored. You can do a char per bit, a char per nibble, a char per byte, in base64, in base96, punnycode, QRcode or whatever, but binary to decimal is an information loss process that results in trouble. – Luis Colorado May 29 '23 at 12:03
  • We will have to agree to disagree. I assert that binary → decimal with at least *xxx*_DECIMAL_DIG digits does *not* lose information and so can be perfectly reversed. – Steve Summit May 29 '23 at 12:07
  • The worst thing in this case is that on bounded precision systems, your rounding error depends on the digit you round at the considered rounding position, and this can make some conversions to go well, while making others to fail. And this is not compatible with a no information loss storage system. – Luis Colorado May 29 '23 at 12:07
  • probably we can disagree... but I have mathematically proven the non-biyective application of the rational numbers represented. Just disagree or signal my failure in my demonstration. – Luis Colorado May 29 '23 at 12:09
  • nope... you say *decimal with at least xxx_DEIMAL_DIG* and you are not considering the rounding errors at exactly that point. There's no possibility of reaching the full precision loss of information due to rounding, if you don't consider a suitable decimal number length (this meaning a variable length, because, as you saw above, as far as you depart from 1.0 you need more digits to represent in decimal, a binary number, not considering the divisors of 10 that are not divisors of 2) – Luis Colorado May 29 '23 at 12:13
  • Yes, you might need N digits to represent N bits (and sometimes quite a few more than N digits in the worst case). Yes, that may be a waste, but that doesn't mean it won't work, because as you point out it's always finite. And anyway there's no need to insist on a fixed-point (`%f`) representation for your decimal string. If you use `%e` or `%g`, you can get away with `DBL_DECIMAL_DIG+10` for `double` (or even a bit less), and that's ~27 bytes, and that's pretty reasonable. – Steve Summit May 29 '23 at 12:18
  • You keep mentioning rounding errors, and finite precision. Are you considering the state of the world *prior* to Dragon? I am absolutely assuming a Dragon-or-better binary-to-decimal conversion. – Steve Summit May 29 '23 at 12:18
  • please, convert 1.0E-1000(bin. exponent has been specified in base 10 to indicate the number of powers of two to be used) to decimal binary representation (exact, no rounding allowed) and then you will see why this is impractical (see that it is a number easily represented in binary form with just 8 ascii chars, but try to write it in decimal form with no lose of precision) – Luis Colorado May 29 '23 at 12:25
  • Why do I need to convert it exactly, without rounding? I merely need to convert it to a decimal string which, when converted back to binary, with N bits of precision, is guaranteed to round to exactly 1.0E-1000 (to N bits). For N bits of precision, that decimal string will be on the order of N digits. But I fear we're still talking past each other. – Steve Summit May 29 '23 at 13:02
  • because rounding is the problem here... the base 2 ticks are closer to each other than the decimal ones (at the same scale, about one third in size), and rounding can make that the closest tick rounding in decimal base, will not round to the original place, when rounding back in base 2 (I can show you a graphical example). This will make a different value to be recovered on restoring than the one that was saved... the question states _to read it as *the same* float_ and so, approximate computing cannot achieve the desired effect of restoring the value with full (not almost full) precision. – Luis Colorado May 30 '23 at 05:08
  • Well, sure, ticks of, say, 10^-15 are coarser, by a factor of about 2.3, than ticks of size 2^-51. But ticks of size 10^-16 are *finer*, by a factor of 4.4. Nobody says you have to use "the same scale", but if you use a digit or two more than the bare minimum, you're guaranteed to always get the same base-2 value back. (I can't prove this, but real number theorists can.) See also [this answer](https://stackoverflow.com/questions/61609276#61614323) (as linked from my answer). – Steve Summit May 31 '23 at 02:51
  • Let me ask you this: If it's not possible to convert a binary fraction to decimal and then back to binary without loss, what are `FLT_DECIMAL_DIG` and `DBL_DECIMAL_DIG` for, and what do they mean? – Steve Summit May 31 '23 at 02:51
  • Yes, but's is very inefficient, because of the conversion from binary to decimal. I had a look at the problem I was telling you and realized that you can avoid it, if just you add one more digit of precission to the decimal side problem (it's shown [here](https://www.geogebra.org/calculator/etvnfws9)) This will require one extra digit on the decimal side, to cope with the many to one mapping which becomes then a one to many, and has no problem on converting back. Sorry for the delay in answering. – Luis Colorado May 31 '23 at 05:31
  • Anyway, there's still an issue, that makes the conversion inefficient. If you had tried to convert from decimal to base2 (or the reverse, you have to do both) then you'll notice that you have to express binary powers of 2 in decimal and/or binary powers of 10 in binary form. This results in very long decimal/binary numbers that have to be carried on in the conversion. The example was 1.0E-1000(bin) on purpose, to make you see it. – Luis Colorado May 31 '23 at 05:34
  • I had to write a portable conversion of full `double` support in Java (that doesn't allow you access to the binary internal representation) to binary, and I had to manage a table of powers of ten in base 2 to cope with the scaling needed to start converting a normalized significand. – Luis Colorado May 31 '23 at 05:38