0

Is the algorithm used for rounding a float in Python to a specified number of digits specified in any Python documentation? The semantics of round with zero fractional digits (i.e. rounding to an integer) are simple to understand, but it's not clear to me how the case where the number of digits is nonzero is implemented.

The most straightforward implementation of the function that I can think of (given the existence of round to zero fractional digits) would be:

def round_impl(x, ndigits):
    return (10 ** -ndigits) * round(x * (10 ** ndigits))

I'm trying to write some C++ code that mimics the behavior of Python's round() function for all values of ndigits, and the above agrees with Python for the most part, when translated to equivalent C++ calls. However, there are some cases where it differs, e.g.:

>>> round(0.493125, 5)
0.49312
>>> round_impl(0.493125, 5)
0.49313

There is clearly a difference that occurs when the value to be rounded is at or very near the exact midpoint between two potential output values. Therefore, it seems important that I try to use the same technique if I want similar results.

Is the specific means for performing the rounding specified by Python? I'm using CPython 2.7.15 in my tests, but I'm specifically targeting v2.7+.

Jason R
  • 11,159
  • 6
  • 50
  • 81
  • maybe will this help https://docs.python.org/2.7/tutorial/floatingpoint.html – darc Aug 15 '18 at 15:01
  • I understand the intricacies of floating-point arithmetic; that's why it's so important to be careful when trying to make two separate pieces of software in different languages agree on the same result. However, what I'm trying to determine is whether the actual implementation of `round()` is specified tightly enough so that it can be reliably duplicated elsewhere. – Jason R Aug 15 '18 at 15:05
  • 1
    I believe the python 3 rounding behaviour is quite simple, all else equal it picks the even choice, the python 2 behavior is more complex and probably not something you want to emulate, see https://stackoverflow.com/a/22155830/6260170 – Chris_Rands Aug 15 '18 at 15:08
  • In python Python/pymath.c it looks like it uses floor function. https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Python/pymath.c – darc Aug 15 '18 at 15:09
  • @darc: That is a just a definition of the C function `round()`, the Python function `round()` for `float` is defined in https://github.com/python/cpython/blob/master/Objects/floatobject.c and it is a bit more complicated so it can get the correct result. – Dietrich Epp Aug 15 '18 at 15:34

2 Answers2

3

Also refer to What Every Programmer Should Know About Floating-Point Arithmetic, which has more detailed explanations for why this is happening as it is.

This is a mess. First of all, as far as float is concerned, there is no such number as 0.493125, when you write 0.493125 what you actually get is:

0.493124999999999980015985556747182272374629974365234375

So this number is not exactly between two decimals, it's actually closer to 0.49312 than it is to 0.49313, so it should definitely round to 0.49312, that much is clear.

The problem is that when you multiply by 105, you get the exact number 49312.5. So what happened here is the multiplication gave you an inexact result which by coincidence canceled out the rounding error in the original number. Two rounding errors canceled each other out, yay! But the problem is that when you do this, the rounding is actually incorrect... at least if you want to round up at midpoints, but Python 3 and Python 2 behave differently. Python 2 rounds away from 0, and Python 3 rounds towards even least-significant digits.

Python 2

if two multiples are equally close, rounding is done away from 0

Python 3

...if two multiples are equally close, rounding is done toward the even choice...

Summary

In Python 2,

>>> round(49312.5)
49313.0
>>> round(0.493125, 5)
0.49312

In Python 3,

>>> round(49312.5)
49312
>>> round(0.493125, 5)
0.49312

And in both cases, 0.493125 is really just a short way of writing 0.493124999999999980015985556747182272374629974365234375.

So, how does it work?

I see two plausible ways for round() to actually behave.

  1. Choose the closest decimal number with the specified number of digits, and then round that decimal number to float precision. This is hard to implement, because it requires doing calculations with more precision than you can get from a float.

  2. Take the two closest decimal numbers with the specified number of digits, round them both to float precision, and return whichever is closer. This will give incorrect results, because it rounds numbers twice.

And Python chooses... option #1! The exactly correct, but much harder to implement version. Refer to Objects/floatobject.c:927 double_round(). It uses the following process:

  1. Write the floating-point number to a string in decimal format, using the requested precision.

  2. Parse the string back in as a float.

This uses code based on David Gay's dtoa library. If you want C++ code that gets the actual correct result like Python does, this is a good start. Fortunately you can just include dtoa.c in your program and call it, since its licensing is very permissive.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • Great answer, thank you. It gives me a lot to think about. My main goal of making an independent implementation is to improve on the existing Python code speed-wise. I'd like to avoid float->string->float conversion if possible, but I understand that more work than the trivial implementation I suggested is required. – Jason R Aug 15 '18 at 15:36
  • 1
    @JasonR: The hard part about doing float->string or string->float correctly is maintaining the required level of precision to get the correct results for all inputs. This is actually very difficult, and rounding a number correctly involves most of the same difficulties. This suggests to me that if you try to avoid the float->string->float conversion you are likely to end up spending an enormous amount of effort duplicating work that's already been done, you may introduce subtle or hard to detect errors, and in the end your code may not be much faster. – Dietrich Epp Aug 15 '18 at 16:29
  • 1
    @JasonR: You can think of `dtoa` as formatting a number as a string, but really what it's doing is solving the problem "given a float, what is the closest decimal with the given precision" and its reverse is "given a decimal, what is the closest float." These are *exactly* the problems that you would need to solve to implement `round()` correctly, and if you need variable precision decimal numbers, it is convenient and efficient to store them in strings. – Dietrich Epp Aug 15 '18 at 16:33
0

The Python documentation for and 2.7 specifies the behaviour:

Values are rounded to the closest multiple of 10 to the power minus ndigits; if two multiples are equally close, rounding is done away from 0.

For 3.7:

For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus ndigits; if two multiples are equally close, rounding is done toward the even choice

Update:

The (cpython) implementation can be found floatobjcet.c in the function float___round___impl, which calls round if ndigits is not given, but double_round if it is.

double_round has two implementations. One converts the double to a string (aka decimal) and back to a double. The other one does some floating point calculations, calls to pow and at its core calls round. It seems to have more potential problems with overflows, since it actually multiplies the input by 10**-ndigits.

For the precise algorithm, look at the linked source file.

YSelf
  • 2,646
  • 1
  • 14
  • 19
  • This not only does not explain the behavior in the question, but it's also incorrect! If you actually read the documentation for Python 3.7 the behavior is **different.** – Dietrich Epp Aug 15 '18 at 15:03
  • I understand that, but the example I gave above isn't consistent with that, assuming the naive implementation that I came up with. That's the essence of my question. – Jason R Aug 15 '18 at 15:03
  • @DietrichEpp You are right, 3 is not doing the "away from 0" rounding. – YSelf Aug 15 '18 at 15:05
  • 1
    @JasonR Did you read the note next to the documentation? It explains the 'weird' behaviour. – MegaIng Aug 15 '18 at 15:06
  • @MegaIng: Yes, I understand that floating-point arithmetic is not exact. However, I would like to be able to produce calculations that agree with Python's `round()` function when `ndigits != 0`. In this case, due to the inexactness of floating-point arithmetic, exactly how it goes about performing the rounding is critical. That's what I'm trying to determine: is this behavior specified and/or consistent across interpreter versions/implementations? – Jason R Aug 15 '18 at 15:08
  • @JasonR This behavior is consistent for CPython and probably all other implementation that use binary floating point numbers. If you want to know how it is implemented look at the CPython source code – MegaIng Aug 15 '18 at 15:10
  • @JasonR The implementation is here: https://github.com/python/cpython/blob/e0b5b2096ead4cc094a129ce7602ac5c0e998086/Objects/floatobject.c#L927 – MegaIng Aug 15 '18 at 15:17
  • I've added a summary of the algorithm in the answer. The short story is double to string to double conversion. – YSelf Aug 15 '18 at 15:31