5

I've run into an issue displaying float values in Python, loaded from an external data-source
(they're 32bit floats, but this would apply to lower precision floats too).

(In case its important - These values were typed in by humans in C/C++, so unlike arbitrary calculated values, deviations from round numbers is likely not intended, though can't be ignored since the values may be constants such as M_PI or multiplied by constants).

Since CPython uses higher precision, (64bit typically), a value entered in as a lower precision float may repr() showing precision loss from being a 32bit-float, where the 64bit-float would show round values.

eg:

# Examples of 32bit float's displayed as 64bit floats in CPython.
0.0005 -> 0.0005000000237487257
0.025  -> 0.02500000037252903
0.04   -> 0.03999999910593033
0.05   -> 0.05000000074505806
0.3    -> 0.30000001192092896
0.98   -> 0.9800000190734863
1.2    -> 1.2000000476837158
4096.3 -> 4096.2998046875

Simply rounding the values to some arbitrary precision works in most cases, but may be incorrect since it could loose significant values with eg: 0.00000001.

An example of this can be shown by printing a float converted to a 32bit float.

def as_float_32(f):
    from struct import pack, unpack
    return unpack("f", pack("f", f))[0]

print(0.025)               #  --> 0.025
print(as_float_32(0.025))  #  --> 0.02500000037252903

So my question is:

Whats the most efficient & straightforward way to get the original representation for a 32bit float, without making assumptions or loosing precision?

Put differently, if I have a data-source containing of 32bit floats, These were originally entered in by a human as round values, (examples above), but having them represented as higher precision values exposes that the value as a 32bit float is an approximation of the original value.

I would like to reverse this process, and get the round number back from the 32bit float data, but without loosing the precision which a 32bit float gives us. (which is why simply rounding isn't a good option).


Examples of why you might want to do this:

  • Generating API documentation where Python extracts values from a C-API that uses single precision floats internally.
  • When people need to read/review values of data generated which happens to be provided as single precision floats.

In both cases it's important not to loose significant precision, or show values which can't be easily read by humans at a glance.


  • Update, I've made a solution which I'll include as an answer (for reference and to show its possible), but highly doubt its an efficient or elegant solution.

  • Of course you can't know the notation used: 0.1f, 0.1F or 1e-1f where entered, that's not the purpose of this question.

ideasman42
  • 42,413
  • 44
  • 197
  • 320
  • 2
    You'd probably want a modified version of `dtoa.c`, which is part of the Python source code, but is also found in a number of other projects. This is the code that prints 0.1 as 0.1 instead of 0.1000000000000000055511151231257827021181583404541015625. – Dietrich Epp Feb 29 '16 at 01:14
  • @DietrichEpp, good to know, however for existing Python-only scripts, including C code isn't always an attractive option (assuming the project doesn't already use C-extensions). – ideasman42 Feb 29 '16 at 04:47
  • Just pointing out the possibilities. The `dtoa.c` file has been copied to many projects and extensively tested, it will generally be easier to modify it than to try to kludge together your own solution, which probably won't handle edge cases properly. There are *lots* of edge cases. – Dietrich Epp Feb 29 '16 at 04:50
  • You could even port `dtoa.c` to Python, if you like. – Dietrich Epp Feb 29 '16 at 04:52
  • AFAICS my own solution http://stackoverflow.com/a/35690179/432509 (while not optimal) is correct. So this can be solved without re-implementing dtoa, however, for the most optimal solution, a modified dtoa would be best I guess. – ideasman42 Feb 29 '16 at 05:28
  • 2
    Why so many downvotes (3 at the moment) on this question? – ShreevatsaR May 05 '16 at 00:42

6 Answers6

11

You're looking to solve essentially the same problem that Python's repr solves, namely, finding the shortest decimal string that rounds to a given float. Except that in your case, the float isn't an IEEE 754 binary64 ("double precision") float, but an IEEE 754 binary32 ("single precision") float.

Just for the record, I should of course point out that retrieving the original string representation is impossible, since for example the strings '0.10', '0.1', '1e-1' and '10e-2' all get converted to the same float (or in this case float32). But under suitable conditions we can still hope to produce a string that has the same decimal value as the original string, and that's what I'll do below.

The approach you outline in your answer more-or-less works, but it can be streamlined a bit.

First, some bounds: when it comes to decimal representations of single-precision floats, there are two magic numbers: 6 and 9. The significance of 6 is that any (not-too-large, not-too-small) decimal numeric string with 6 or fewer significant decimal digits will round-trip correctly through a single-precision IEEE 754 float: that is, converting that string to the nearest float32, and then converting that value back to the nearest 6-digit decimal string, will produce a string with the same value as the original. For example:

>>> x = "634278e13"
>>> y = float(np.float32(x))
>>> y
6.342780214942106e+18
>>> "{:.6g}".format(y)
'6.34278e+18'

(Here, by "not-too-large, not-too-small" I just mean that the underflow and overflow ranges of float32 should be avoided. The property above applies for all normal values.)

This means that for your problem, if the original string had 6 or fewer digits, we can recover it by simply formatting the value to 6 significant digits. So if you only care about recovering strings that had 6 or fewer significant decimal digits in the first place, you can stop reading here: a simple '{:.6g}'.format(x) is enough. If you want to solve the problem more generally, read on.

For roundtripping in the other direction, we have the opposite property: given any single-precision float x, converting that float to a 9-digit decimal string (rounding to nearest, as always), and then converting that string back to a single-precision float, will always exactly recover the value of that float.

>>> x = np.float32(3.14159265358979)
>>> x
3.1415927
>>> np.float32('{:.9g}'.format(x)) == x
True

The relevance to your problem is there's always at least one 9-digit string that rounds to x, so we never have to look beyond 9 digits.

Now we can follow the same approach that you used in your answer: first try for a 6-digit string, then a 7-digit, then an 8-digit. If none of those work, the 9-digit string surely will, by the above. Here's some code.

def original_string(x):
    for places in range(6, 10):  # try 6, 7, 8, 9
        s = '{:.{}g}'.format(x, places)
        y = np.float32(s)
        if x == y:
            return s
    # If x was genuinely a float32, we should never get here.
    raise RuntimeError("We should never get here")

Example outputs:

>>> original_string(0.02500000037252903)
'0.025'
>>> original_string(0.03999999910593033)
'0.04'
>>> original_string(0.05000000074505806)
'0.05'
>>> original_string(0.30000001192092896)
'0.3'
>>> original_string(0.9800000190734863)
'0.98'

However, the above comes with several caveats.

  • First, for the key properties we're using to be true, we have to assume that np.float32 always does correct rounding. That may or may not be the case, depending on the operating system. (Even in cases where the relevant operating system calls claim to be correctly rounded, there may still be corner cases where that claim fails to be true.) In practice, it's likely that np.float32 is close enough to correctly rounded not to cause issues, but for complete confidence you'd want to know that it was correctly rounded.

  • Second, the above won't work for values in the subnormal range (so for float32, anything smaller than 2**-126). In the subnormal range, it's no longer true that a 6-digit decimal numeric string will roundtrip correctly through a single-precision float. If you care about subnormals, you'd need to do something more sophisticated there.

  • Third, there's a really subtle (and interesting!) error in the above that almost doesn't matter at all. The string formatting we're using always rounds x to the nearest places-digit decimal string to the true value of x. However, we want to know simply whether there's any places-digit decimal string that rounds back to x. We're implicitly assuming the (seemingly obvious) fact that if there's any places-digit decimal string that rounds to x, then the closest places-digit decimal string rounds to x. And that's almost true: it follows from the property that the interval of all real numbers that rounds to x is symmetric around x. But that symmetry property fails in one particular case, namely when x is a power of 2.

So when x is an exact power of 2, it's possible (but fairly unlikely) that (for example) the closest 8-digit decimal string to x doesn't round to x, but nevertheless there is an 8-digit decimal string that does round to x. You can do an exhaustive search for cases where this happens within the range of a float32, and it turns out that there are exactly three values of x for which this occurs, namely x = 2**-96, x = 2**87 and x = 2**90. For 7 digits, there are no such values. (And for 6 and 9 digits, this can never happen.) Let's take a closer look at the case x = 2**87:

>>> x = 2.0**87
>>> x
1.5474250491067253e+26

Let's take the closest 8-digit decimal value to x:

>>> s = '{:.8g}'.format(x)
>>> s
'1.547425e+26'

It turns out that this value doesn't round back to x:

>>> np.float32(s) == x
False

But the next 8-digit decimal string up from it does:

>>> np.float32('1.5474251e+26') == x
True

Similarly, here's the case x = 2**-96:

>>> x = 2**-96.
>>> x
1.262177448353619e-29
>>> s = '{:.8g}'.format(x)
>>> s
'1.2621774e-29'
>>> np.float32(s) == x
False
>>> np.float32('1.2621775e-29') == x
True

So ignoring subnormals and overflows, out of all 2 billion or so positive normal single-precision values, there are precisely three values x for which the above code doesn't work. (Note: I originally thought there was just one; thanks to @RickRegan for pointing out the error in comments.) So here's our (slightly tongue-in-cheek) fixed code:

def original_string(x):
    """
    Given a single-precision positive normal value x,
    return the shortest decimal numeric string which produces x.
    """
    # Deal with the three awkward cases.
    if x == 2**-96.:
        return '1.2621775e-29'
    elif x == 2**87:
        return '1.5474251e+26'
    elif x == 2**90:
        return '1.2379401e+27'

    for places in range(6, 10):  # try 6, 7, 8, 9
        s = '{:.{}g}'.format(x, places)
        y = np.float32(s)
        if x == y:
            return s
    # If x was genuinely a float32, we should never get here.
    raise RuntimeError("We should never get here")
Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
  • Mark, great answer (just got around to reading it). Do you have an example where this happens for double-precision (to 16 digits)? (I could check myself but wanted to know if you've done it already.) I'll have to go back and look at the Steele & White, David Gay, etc. papers to see if they talk about this. – Rick Regan Apr 11 '16 at 17:40
  • @RickRegan: No, I hadn't checked, but I have now! I count 54 powers of two `x` within the binary64 normal range for which the closest 16-digit decimal doesn't round to `x`, but there is nevertheless a 16-digit decimal value which *does* round to `x`. Of those 54, 8 have a 15-digit representation, leaving 46 "problem" numbers. The smallest is `2**-1017`, the largest `2**976`, and the one with smallest exponent size is `2**-24`. I did the calculation two different ways (one using fractions, one using Python's float machinery), but independent confirmation would be great. – Mark Dickinson Apr 11 '16 at 19:47
  • I wrote a C program using David Gay's routines that also found 46 cases: smallest negative power: 2^-1017; largest negative power: 2^-24; smallest positive power: 2^89; largest positive power: 2^976. (This was interesting -- thanks.) – Rick Regan Apr 12 '16 at 01:48
  • Mark, intuitively I can see why this can only happen for the "middle" digit counts (7, 8 for single and 16 for double), but is there a (simple) proof? It seems that at least a necessary condition is that the decimal gap size around a given power of two be between the binary gap sizes on either side of the power of two. – Rick Regan Apr 14 '16 at 03:17
  • @RickRegan: It follows directly from the two roundtrip results (and there's a simple proof of those that works without modification for power-of-two edge cases, but it's a bit long for a comment). For a given float `x`, we're looking for cases where there's an n-digit decimal that rounds to `x`, but the *closest* n-digit decimal doesn't round to `x`. For 6 digits that can't happen because all 6-digit decimals round to *different* `float`s (else roundtripping wouldn't work). For 9 digits it can't happen because the closest 9-digit decimal *always* rounds to x (from the roundtripping, again). – Mark Dickinson Apr 14 '16 at 06:25
  • I know you're right but it doesn't satisfy me like a lower-level "gaps based" proof would (of course round-tripping is based on gap sizes, so that's embedded in your argument). For 6 or fewer digits, the decimal values are so far apart relative to the floats that only the nearest could map to x (if any do). For 9 or more digits, the decimal values are so close together that the nearest has to map to x (even if further away ones do as well). For 7-8 digits, the change in relative gap size at powers of two messes us up. That is not a proof, but maybe the overview of one. I'll think more about it – Rick Regan Apr 14 '16 at 15:13
  • Re: your proof for the round-trip result: I wrote a series of articles last year (rooted at http://www.exploringbinary.com/the-inequality-that-governs-round-trip-conversions-a-partial-proof/ ) laying out a proof, or at least "half" of one. If you are so inclined maybe you can sketch your proof as a comment on my blog? Thanks. – Rick Regan Apr 14 '16 at 15:15
  • @RickRegan: Yes, I may do that; I keep meaning to write it up somewhere, since I have to reconstruct it every time I need it. I'm out of both time and energy at the moment, though. – Mark Dickinson Apr 18 '16 at 20:05
  • Not trying to beat this to death (but I am writing an article about it) but I can't replicate your float results. You say there are 7-digit decimal strings that work for x = 2^87 and 2^90, but I didn't find any (nearest or otherwise). – Rick Regan Apr 21 '16 at 01:17
  • @RickRegan: Whoops, I think you're right; good catch. I'll update the answer after work. – Mark Dickinson Apr 21 '16 at 07:32
  • @RickRegan: Fixed. – Mark Dickinson Apr 22 '16 at 17:32
  • My article about this: http://www.exploringbinary.com/the-shortest-decimal-string-that-round-trips-may-not-be-the-nearest/ – Rick Regan Apr 28 '16 at 17:14
2

I think Decimal.quantize() (to round to a given number of decimal digits) and .normalize() (to strip trailing 0's) is what you need.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from decimal import Decimal

data = (
    0.02500000037252903,
    0.03999999910593033,
    0.05000000074505806,
    0.30000001192092896,
    0.9800000190734863,
    )

for f in data:
    dec = Decimal(f).quantize(Decimal('1.0000000')).normalize()
    print("Original %s -> %s" % (f, dec))

Result:

Original 0.0250000003725 -> 0.025
Original 0.0399999991059 -> 0.04
Original 0.0500000007451 -> 0.05
Original 0.300000011921 -> 0.3
Original 0.980000019073 -> 0.98
ideasman42
  • 42,413
  • 44
  • 197
  • 320
John Carter
  • 53,924
  • 26
  • 111
  • 144
  • Whats not clear from this answer (and the documentation links). is how you would use these functions to perform this operation ensuring significant values supported by 32bit precision would not be lost. – ideasman42 Feb 29 '16 at 04:49
  • @ideasman42 true, my answer would fail for numbers that are much larger or smaller than your example. – John Carter Feb 29 '16 at 09:11
  • I think this can be fixed by calculating the quantize value to give the right number of decimal places of precision - approximately 6 - see http://stackoverflow.com/questions/10484332/how-to-calculate-decimal-digits-of-precision-based-on-the-number-of-bits Note that special handling will be needed for very large numbers, since the floating point imprecision will be before the decimal point. – John Carter Feb 29 '16 at 19:26
  • Tested and while this works well for small numbers, it fails for `4096.2998047` for example. – ideasman42 Mar 02 '16 at 02:36
1

Heres a solution I've come up with which works (perfectly as far as I can tell) but isn't efficient.

It works by rounding at increasing decimal places, and returning the string when the rounded and non-rounded inputs match (when compared as values converted to lower precision).

Code:

def round_float_32(f):
    from struct import pack, unpack
    return unpack("f", pack("f", f))[0]


def as_float_low_precision_repr(f, round_fn):
    f_round = round_fn(f)
    f_str = repr(f)
    f_str_frac = f_str.partition(".")[2]
    if not f_str_frac:
        return f_str
    for i in range(1, len(f_str_frac)):
        f_test = round(f, i)
        f_test_round = round_fn(f_test)
        if f_test_round == f_round:
            return "%.*f" % (i, f_test)
    return f_str

# ----

data = (
    0.02500000037252903,
    0.03999999910593033,
    0.05000000074505806,
    0.30000001192092896,
    0.9800000190734863,
    1.2000000476837158,
    4096.2998046875,
    )

for f in data:
    f_as_float_32 = as_float_low_precision_repr(f, round_float_32)
    print("%s -> %s" % (f, f_as_float_32))

Outputs:

0.02500000037252903 -> 0.025
0.03999999910593033 -> 0.04
0.05000000074505806 -> 0.05
0.30000001192092896 -> 0.3
0.9800000190734863 -> 0.98
1.2000000476837158 -> 1.2
4096.2998046875 -> 4096.3
ideasman42
  • 42,413
  • 44
  • 197
  • 320
1

If you have at least NumPy 1.14.0, you can just use repr(numpy.float32(your_value)). Quoting the release notes:

Float printing now uses “dragon4” algorithm for shortest decimal representation

The str and repr of floating-point values (16, 32, 64 and 128 bit) are now printed to give the shortest decimal representation which uniquely identifies the value from others of the same type. Previously this was only true for float64 values. The remaining float types will now often be shorter than in numpy 1.13.

Here's a demo running against a few of your example values:

>>> repr(numpy.float32(0.0005000000237487257))
'0.0005'
>>> repr(numpy.float32(0.02500000037252903))
'0.025'
>>> repr(numpy.float32(0.03999999910593033))
'0.04'
Community
  • 1
  • 1
user2357112
  • 260,549
  • 28
  • 431
  • 505
0

At least in python3 you can use .as_integer_ratio. That's not exactly a string but the floating point definition as such is not really well suited for giving an exact representation in "finite" strings.

a = 0.1
a.as_integer_ratio()
(3602879701896397, 36028797018963968)

So by saving these two numbers you'll never lose precision because these two exactly represent the saved floating point number. (Just divide the first by the second to get the value).


As an example using numpy dtypes (very similar to c dtypes):

# A value in python floating point precision
a = 0.1
# The value as ratio of integers
b = a.as_integer_ratio()

import numpy as np
# Force the result to have some precision:
res = np.array([0], dtype=np.float16)
np.true_divide(b[0], b[1], res)
print(res)
# Compare that two the wanted result when inputting 0.01
np.true_divide(1, 10, res)
print(res)

# Other precisions:
res = np.array([0], dtype=np.float32)
np.true_divide(b[0], b[1], res)
print(res)
res = np.array([0], dtype=np.float64)
np.true_divide(b[0], b[1], res)
print(res)

The result of all these calculations is:

[ 0.09997559] # Float16 with integer-ratio
[ 0.09997559] # Float16 reference
[ 0.1] # Float32
[ 0.1] # Float64
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • How would you would use this to get a rounded, lower precision float representational? – ideasman42 Feb 29 '16 at 00:42
  • Just evaluate the division of these two numbers with the wanted precision (data-type)? I'm not sure anymore if I understand your question: Do you want to round the value as precise as possible (how would you go about it given that you are doing the calculations with the same precision) or do you want an exact representation of your number? – MSeifert Feb 29 '16 at 00:55
  • Updated my question, and added my own answer to show its possible. Your answer looks helpful though it relies on numpy, and I'f rather use vanilla Python if possible. – ideasman42 Feb 29 '16 at 01:15
  • Numpy is just to demonstrate different dtypes, in python itself you cannot change it so any difference would be lost. And about your question: With these two numbers you can compare the order of magnitude (let say 1 / 100) to get the first significant digit. If that still does not give you an appropriate answer just comment and I'll delete the question. I think there were a few misunderstandings on my part. :-) – MSeifert Feb 29 '16 at 10:50
0

Probably what you are looking for is decimal:

Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.”

maxbublis
  • 1,273
  • 10
  • 21
  • 2
    How would decimal's be applied to this problem? (assuming I already have the data in an array which needs to be represented as strings). – ideasman42 Feb 29 '16 at 00:55
  • @ideasman42 you can use `quantize()` and `normalize()` methods to achieve that – maxbublis Feb 29 '16 at 02:38
  • @therefromhere's answer shows how this can be used. However it's not reliable and fails for some float values. – ideasman42 May 05 '16 at 10:06