Does rounding with integer-conversion has bad side effects?

Question

As you can see here, rounding in Python (and Java, etc.) should not be done thoughtless.

If you want to round like you lerned at school, you shouldn't do this:

>>> round(20.5)
20

To round 'schoolish' normally you would use the Decimal-method:

>>> import decimal
>>> decimal.Decimal(20.5).quantize(1, rounding=decimal.ROUND_HALF_UP)
Decimal('21')

In my opinion that is not pythonic and i will never be able to keep it in mind.

Another option would be:

>>> int(20.5 + 0.5)
21

If you want to round to a specific part after comma do:

>>> int(20.5555555555 * 1000 + 0.5) / 1000
20.556

Does that way of rounding produce some bad side effects?

Using `Decimal` is the way to go, because not every floating point number can be represented by the `float` type. As described in [the documentation](https://docs.python.org/3/library/functions.html#round) this can lead to unexpected behavior such as `round(2.675, 2)` gives `2.67` and not `2.68`. This is because the closest 64-bit `float` value that can represent `2.675` is actually a bit smaller than that. — a_guest, Sep 07 '19 at 14:20
@a_guest: Thanks for your answer, but i want to advise kindly that i asked if the rounding method i use has bad side effects if I want to round normal floats. — Frank, Sep 07 '19 at 14:48
Note that there is a reason ``round`` behaves the way it does. Before trying to circumvent that, consider whether it actually is the worse side effect. See https://en.wikipedia.org/wiki/Rounding#Round_half_to_even — MisterMiyagi, Sep 07 '19 at 16:38

a_guest · Accepted Answer · 2019-09-09T16:27:00.250

What you describe is (almost) the round half up strategy. However using int it won't work for negative numbers:

>>> def round_half_up(x, n=0):
...     shift = 10 ** n
...     return int(x*shift + 0.5) / shift
... 
>>> round_half_up(-1.26, 1)
-1.2

Instead you should use math.floor in order to handle negative number correctly:

>>> import math
>>> 
>>> def round_half_up(x, n=0):
...     shift = 10 ** n
...     return math.floor(x*shift + 0.5) / shift
... 
>>> round_half_up(-1.26, 1)
-1.3

This strategy suffers from the effect that it tends to distort statistics of a collection of numbers, such as the mean or the standard deviation. Suppose you have collected some numbers and all of them end in .5; then rounding each of them up will clearly increase the average:

>>> numbers = [-3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5]
>>> N = len(numbers)
>>> sum(numbers) / N
0.0
>>> sum(round_half_up(x) for x in numbers) / N
0.5

If we use the strategy round half to even instead this will cause some numbers to be rounded up and others to be rounded down and hence compensating each other:

>>> sum(round(x) for x in numbers) / N
0.0

As you can see, for the example, the average remains preserved.

This works of course only if the numbers are uniformly distributed. If there is a tendency to favor numbers of the form odd + 0.5 then this strategy won't prevent a bias either:

>>> numbers = [i + 0.5 for i in range(-3, 3, 2)]
>>> N = len(numbers)
>>> sum(numbers) / N
-0.5
>>> sum(round_half_up(x) for x in numbers) / N
0.0
>>> sum(round(x) for x in numbers) / N
0.0

For this set of numbers, round is effectively doing "round half up" so both methods suffer from the same bias.

As you can see, the rounding strategy clearly influences the bias of several statistics such as the average. "round half to even" tends to remove that bias but obviously favors even over odd numbers and thus also distorts the original distribution.

A note on `float` objects

Due to limited floating point precision this "round half up" algorithm might also yield some unexpected surprises:

>>> round_half_up(-1.225, 2)
-1.23

Interpreting -1.225 as a decimal number we would expect the result to be -1.22 instead. We get -1.23 because the intermediate floating point number in round_half_up slips a bit over it's expected value:

>>> f'{-1.225 * 100 + 0.5:.20f}'
'-122.00000000000001421085'

floor'ing that numbers gives us -123 (instead of -122 if we had gotten -122.0 before). That's due to floating point error and starts with the fact that -1.225 is actually not stored as -1.225 in memory but as a number which is a little bit smaller. For that reason using Decimal is the only way to get correct rounding in all cases.

Just use `repr` (or equivalent formatting) to see all the relevant digits of a `float`. — Davis Herring, Sep 07 '19 at 16:42
I disagree that `round_half_up(-1.225, 2)` **should** be `-1.22`. The argument simply really is closer to `-1.23` (because the decimal number was rounded to the nearest `float` like any other such literal). It’s not the place of a rounding function to guess the **origin** of the number it was given. One could consider a `round_half_up("-1.225", 2)` that would have sufficient information to select -1.22 as a result; that’s the approach taken by the `Decimal` class, of course. — Davis Herring, Sep 07 '19 at 16:47
@DavisHerring `repr` will show as many digits as are necessary to get the same binary representation when doing `float(repr(x))` (i.e. select the shortest option of the (infinitely) many that do so). Hence `repr` might disguise that a float literal is stored as a number that is different from the literal (as a decimal number). I never said that `round_half_up(-1.225, 2)` should return `-1.22`, and it shouldn't due to floating point math. However for the user `-1.225` most likely means a *decimal number* and thus they might expect `round_half_up` to return `-1.22`. — a_guest, Sep 09 '19 at 14:41
Also note that rounding schemas are independent of floating point math implementations. So when a user says they want to round `-1.225` up (as a decimal number) the result should be `-1.22`. It just happens that when storing `-1.225` as an IEEE 754 64-bit float the result is different due to floating point error. — a_guest, Sep 09 '19 at 14:41
I understand all that. I meant to use `repr` instead of `.20f` to show the non-integer nature of the -122.00… value, not to inspect the original `float`. You did say that “The result should’ve been”. A *user* saying “-1.225” is not the same as a Python programmer writing that as a *`float`* literal (or *choosing* to call `float` on user input). — Davis Herring, Sep 09 '19 at 15:15
@DavisHerring Indeed, I just reviewed my answer and rephrased that part. Thanks for the pointer. Regarding `float` literals, not all programmers might be aware of the subtleties that exist around floating point representation. After all this answer was composed in response to a programming question, asking about rounding schemas using `float`. — a_guest, Sep 09 '19 at 16:42
You’re welcome. I’m certainly not discouraging educating programmers about floating-point (literals and otherwise); I just think it’s best to phrase it as “This is what you should consider when choosing a data type” rather than “`float` damages your beautiful decimal literals”. — Davis Herring, Sep 10 '19 at 00:08

score 1 · Answer 2 · answered Sep 07 '19 at 16:29

My understanding is that your suggestion of int(x+0.5) should work fine because it returns an integer object that will be exact. However, your subsequent suggestion of dividing by 1000 to round to a certain number of decimal places will return a floating point object so will suffer from exactly the issue you are trying to avoid. Fundamentally you cannot avoid the issue of floating point precision unless you completely avoid the floating point type by using either decimal or pure integers.

Does rounding with integer-conversion has bad side effects?

2 Answers2

A note on float objects

A note on `float` objects