3

I am depending on some code that uses the Decimal class because it needs precision to a certain number of decimal places. Some of the functions allow inputs to be floats because of the way that it interfaces with other parts of the codebase. To convert them to decimal objects, it uses things like

mydec = decimal.Decimal(str(x))

where x is the float taken as input. My question is, does anyone know what the standard is for the 'str' method as applied to floats?

For example, take the number 2.1234512. It is stored internally as 2.12345119999999999 because of how floats are represented.

>>> x = 2.12345119999999999
>>> x
2.1234511999999999
>>> str(x)
'2.1234512'

Ok, str(x) in this case is doing something like '%.6f' % x. This is a problem with the way my code converts to decimals. Take the following:

>>> d = decimal.Decimal('2.12345119999999999')
>>> ds = decimal.Decimal(str(2.12345119999999999))
>>> d - ds
Decimal('-1E-17')

So if I have the float, 2.12345119999999999, and I want to pass it to Decimal, converting it to a string using str() gets me the wrong answer. I need to know what are the rules for str(x) that determine what the formatting will be, because I need to determine whether this code needs to be re-written to avoid this error (note that it might be OK, because, for example, the code might round to the 10th decimal place once we have a decimal object)

There must be some set of rules in python's docs that hopefully someone here can point me to. Thanks!

njnnja
  • 43
  • 1
  • 4
  • `str(x)` returns `x.__str__()` or `x.__repr__()` if the former doesn't exist - what those are is entirely up to the object at hand. – Gareth Latty Feb 25 '13 at 18:32
  • `str` generally returns a more human-readable representation. If you want the full accuracy, you can use `repr`, which will return `'2.12345119999999999'`. – tobias_k Feb 25 '13 at 18:34
  • 1
    See [Strange behaviour with floats and string conversion](http://stackoverflow.com/a/13346122/222914) – Janne Karila Feb 25 '13 at 18:34
  • Then again, if your original number is `2.1234512`, and `2.12345119999999999` is just how it is stored internally, isn't the 'shortened' representation of `str` actually more precise? – tobias_k Feb 25 '13 at 18:38
  • @tobias_k: yes, I happened upon this feature looking at the number 2.1234512, but I could have been looking at 2.123451199999999999. – njnnja Feb 25 '13 at 18:46

2 Answers2

4

In the Python source, look in "Include/floatobject.h". The precision for the string conversion is set a few lines from the top after an comment with some explanation of the choice:

/* The str() precision PyFloat_STR_PRECISION is chosen so that in most cases,
   the rounding noise created by various operations is suppressed, while
   giving plenty of precision for practical use. */

#define PyFloat_STR_PRECISION 12

You have the option of rebuilding, if you need something different. Any changes will change formatting of floats and complex numbers. See ./Objects/complexobject.c and ./Objects/floatobject.c. Also, you can compare the difference between how repr and str convert doubles in these two files.

DrSkippy
  • 390
  • 1
  • 3
  • 1
    Note that this number changed between Python 2.6 and Python 2.7. If you want maximal precision, you should use `repr()` instead of `str()`, which will always give you a representation that will represent the number as accurately as possible within the limits of double precision. – Sven Marnach Feb 25 '13 at 19:29
  • 5
    Rebuilding Python certainly is *not* the right way to deal with this issue. – Sven Marnach Feb 25 '13 at 19:29
  • @SvenMarnach: `PyFloat_STR_PRECISION` hasn't changed in the lifetime of Python 2, as far as I'm aware: it's `12` for both 2.6 and 2.7. There were some minor changes to do with when the stringification of a large value switches to scientific notation, but the `str` of a float is still based on the 12 most significant digits of the decimal expansion. Where this *did* change is in Python 3.2 and later, where `repr` and `str` are now identical for floats. – Mark Dickinson Feb 26 '13 at 09:22
  • @MarkDickinson: Thanks for the correction. I seemed to remember that this change was applied in 3.1 and 2.7, but my memory was apparently wrong. :) Maybe it was the minimal representation thing instead... – Sven Marnach Feb 26 '13 at 11:16
  • @SvenMarnach: Yes: the new algorithms for float->string and string->float conversions went into Python 3.1, and then (later) in Python 2.7, with the difference in `repr` output being the most user-visible part of the change. In theory, there could be corner cases where this affected the `str` as well (e.g., near-halfway cases where the OS wasn't quite doing correct rounding); in practice, I don't know of any such corner cases. – Mark Dickinson Feb 26 '13 at 11:35
3

There's a couple of issues worth discussing here, but the summary is: you cannot extract information that is not stored on your system already.

If you've taken a decimal number and stored it as a floating point, you'll have lost information, since most decimal (base 10) numbers with a finite number of digits cannot be stored using a finite number of digits in base 2 (binary).

As was mentioned, str(a_float) will really call a_float.__str__(). As the documentation states, the purpose of that method is to

return a string containing a nicely printable representation of an object

There's no particular definition for the float case. My opinion is that, for your purposes, you should consider __str__'s behavior to be undefined, since there's no official documentation on it - the current implementation can change anytime.

If you don't have the original strings, there's no way to extract the missing digits of the decimal representation from the float objects. All you can do is round predictably, using string formatting (which you mention):

Decimal( "{0:.5f}".format(a_float) )

You can also remove 0s on the right with resulting_string.rstrip("0"). Again, this method does not recover the information that has been lost.

loopbackbee
  • 21,962
  • 10
  • 62
  • 97