You're looking to solve essentially the same problem that Python's repr
solves, namely, finding the shortest decimal string that rounds to a given float. Except that in your case, the float isn't an IEEE 754 binary64 ("double precision") float, but an IEEE 754 binary32 ("single precision") float.
Just for the record, I should of course point out that retrieving the original string representation is impossible, since for example the strings '0.10'
, '0.1'
, '1e-1'
and '10e-2'
all get converted to the same float (or in this case float32
). But under suitable conditions we can still hope to produce a string that has the same decimal value as the original string, and that's what I'll do below.
The approach you outline in your answer more-or-less works, but it can be streamlined a bit.
First, some bounds: when it comes to decimal representations of single-precision floats, there are two magic numbers: 6
and 9
. The significance of 6
is that any (not-too-large, not-too-small) decimal numeric string with 6 or fewer significant decimal digits will round-trip correctly through a single-precision IEEE 754 float: that is, converting that string to the nearest float32
, and then converting that value back to the nearest 6
-digit decimal string, will produce a string with the same value as the original. For example:
>>> x = "634278e13"
>>> y = float(np.float32(x))
>>> y
6.342780214942106e+18
>>> "{:.6g}".format(y)
'6.34278e+18'
(Here, by "not-too-large, not-too-small" I just mean that the underflow and overflow ranges of float32
should be avoided. The property above applies for all normal values.)
This means that for your problem, if the original string had 6 or fewer digits, we can recover it by simply formatting the value to 6 significant digits. So if you only care about recovering strings that had 6 or fewer significant decimal digits in the first place, you can stop reading here: a simple '{:.6g}'.format(x)
is enough. If you want to solve the problem more generally, read on.
For roundtripping in the other direction, we have the opposite property: given any single-precision float x
, converting that float to a 9-digit decimal string (rounding to nearest, as always), and then converting that string back to a single-precision float, will always exactly recover the value of that float.
>>> x = np.float32(3.14159265358979)
>>> x
3.1415927
>>> np.float32('{:.9g}'.format(x)) == x
True
The relevance to your problem is there's always at least one 9-digit string that rounds to x
, so we never have to look beyond 9 digits.
Now we can follow the same approach that you used in your answer: first try for a 6-digit string, then a 7-digit, then an 8-digit. If none of those work, the 9-digit string surely will, by the above. Here's some code.
def original_string(x):
for places in range(6, 10): # try 6, 7, 8, 9
s = '{:.{}g}'.format(x, places)
y = np.float32(s)
if x == y:
return s
# If x was genuinely a float32, we should never get here.
raise RuntimeError("We should never get here")
Example outputs:
>>> original_string(0.02500000037252903)
'0.025'
>>> original_string(0.03999999910593033)
'0.04'
>>> original_string(0.05000000074505806)
'0.05'
>>> original_string(0.30000001192092896)
'0.3'
>>> original_string(0.9800000190734863)
'0.98'
However, the above comes with several caveats.
First, for the key properties we're using to be true, we have to assume that np.float32
always does correct rounding. That may or may not be the case, depending on the operating system. (Even in cases where the relevant operating system calls claim to be correctly rounded, there may still be corner cases where that claim fails to be true.) In practice, it's likely that np.float32
is close enough to correctly rounded not to cause issues, but for complete confidence you'd want to know that it was correctly rounded.
Second, the above won't work for values in the subnormal range (so for float32
, anything smaller than 2**-126
). In the subnormal range, it's no longer true that a 6-digit decimal numeric string will roundtrip correctly through a single-precision float. If you care about subnormals, you'd need to do something more sophisticated there.
Third, there's a really subtle (and interesting!) error in the above that almost doesn't matter at all. The string formatting we're using always rounds x
to the nearest places
-digit decimal string to the true value of x
. However, we want to know simply whether there's any places
-digit decimal string that rounds back to x
. We're implicitly assuming the (seemingly obvious) fact that if there's any places
-digit decimal string that rounds to x
, then the closest places
-digit decimal string rounds to x
. And that's almost true: it follows from the property that the interval of all real numbers that rounds to x
is symmetric around x
. But that symmetry property fails in one particular case, namely when x
is a power of 2
.
So when x
is an exact power of 2
, it's possible (but fairly unlikely) that (for example) the closest 8-digit decimal string to x
doesn't round to x
, but nevertheless there is an 8-digit decimal string that does round to x
. You can do an exhaustive search for cases where this happens within the range of a float32
, and it turns out that there are exactly three values of x
for which this occurs, namely x = 2**-96
, x = 2**87
and x = 2**90
. For 7 digits, there are no such values. (And for 6 and 9 digits, this can never happen.) Let's take a closer look at the case x = 2**87
:
>>> x = 2.0**87
>>> x
1.5474250491067253e+26
Let's take the closest 8-digit decimal value to x
:
>>> s = '{:.8g}'.format(x)
>>> s
'1.547425e+26'
It turns out that this value doesn't round back to x
:
>>> np.float32(s) == x
False
But the next 8-digit decimal string up from it does:
>>> np.float32('1.5474251e+26') == x
True
Similarly, here's the case x = 2**-96
:
>>> x = 2**-96.
>>> x
1.262177448353619e-29
>>> s = '{:.8g}'.format(x)
>>> s
'1.2621774e-29'
>>> np.float32(s) == x
False
>>> np.float32('1.2621775e-29') == x
True
So ignoring subnormals and overflows, out of all 2 billion or so positive normal single-precision values, there are precisely three values x
for which the above code doesn't work. (Note: I originally thought there was just one; thanks to @RickRegan for pointing out the error in comments.) So here's our (slightly tongue-in-cheek) fixed code:
def original_string(x):
"""
Given a single-precision positive normal value x,
return the shortest decimal numeric string which produces x.
"""
# Deal with the three awkward cases.
if x == 2**-96.:
return '1.2621775e-29'
elif x == 2**87:
return '1.5474251e+26'
elif x == 2**90:
return '1.2379401e+27'
for places in range(6, 10): # try 6, 7, 8, 9
s = '{:.{}g}'.format(x, places)
y = np.float32(s)
if x == y:
return s
# If x was genuinely a float32, we should never get here.
raise RuntimeError("We should never get here")