36

numpy's round int doesn't seem to be consistent with how it deals with xxx.5

In [2]: np.rint(1.5)
Out[2]: 2.0

In [3]: np.rint(10.5)
Out[3]: 10.0

1.5 is rounded up while 10.5 is rounded down. Is there a reason for this? Is it just and artifact of the inaccuracy of floats?

Edit

Is there a way to get the desired functionality where n.5 is rounded up i.e. to n+1 for both n = even or odd?

Community
  • 1
  • 1
Ben
  • 6,986
  • 6
  • 44
  • 71
  • Well, there's always Python's built-in `round`, which rounds a single value, with ties going away from zero. If you need to round all the elements in a numpy array, then of course using Python's `round` could get quite slow. But as Mark Ransom notes in a comment below, numpy doesn't appear to let you choose the rounding rule. – John Y Feb 19 '15 at 22:50
  • 2
    @JohnY Nowadays Python's `round` also rounds to the nearest even integer. – Kirill Bulygin Feb 22 '17 at 15:59
  • 1
    @KirillBulygin - You bring up a good point. Python 2 (including whatever the latest 2.7.x is) rounds away from zero for ties, but Python 3 rounds toward even for ties. – John Y Feb 22 '17 at 18:00

9 Answers9

17

So, this kind of behavior (as noted in comments), is a very traditional form of rounding, seen in the round half to even method. Also known (according to David Heffernan) as banker's rounding. The numpy documentation around this behavior implies that they are using this type of rounding, but also implies that there may be issues with the way in which numpy interacts with the IEEE floating point format. (shown below)

Notes
-----
For values exactly halfway between rounded decimal values, Numpy
rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0,
-0.5 and 0.5 round to 0.0, etc. Results may also be surprising due
to the inexact representation of decimal fractions in the IEEE
floating point standard [1]_ and errors introduced when scaling
by powers of ten.

Whether or not that is the case, I honestly don't know. I do know that large portions of the numpy core are still written in FORTRAN 77, which predates the IEEE standard (set in 1984), but I don't know enough FORTRAN 77 to say whether or not there's some issue with the interface here.

If you're looking to just round up regardless, the np.ceil function (ceiling function in general), will do this. If you're looking for the opposite (always rounding down), the np.floor function will achieve this.

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
  • Good answer, do you happen to know how to get numpy to round up regardless of even or odd? – Ben Feb 19 '15 at 22:23
  • 4
    It has nothing to do with "how floats are represented", it has *everything* to do with the rounding mode being used. There are many to choose from, see e.g. http://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules – Mark Ransom Feb 19 '15 at 22:28
  • P.S. There's no need to dive into the source, the same text is present in [the documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.around.html#numpy.around). There doesn't appear to be any way to change which rounding rules are used. – Mark Ransom Feb 19 '15 at 22:32
  • "even? leave 'em!" is The One True Way to round numbers ;0) – Paul H Feb 19 '15 at 22:35
  • 1
    @Ben yes, np.ceil will do this. – Slater Victoroff Feb 19 '15 at 22:58
  • @MarkRansom are you certain? I was making that claim based on the developers assertion that is has to do with floating point representation, and I tend to trust the numpy developers, but the representation and the rounding method are two sides of a single coin here. Specifically referring to *due to the inexact representation of decimal fractions in the IEEE floating point standard* – Slater Victoroff Feb 19 '15 at 22:59
  • @SlaterTyranus that might be true for .05 or .005 etc. but .5 is never inexact, it's a power of 2 (2**-1). – Mark Ransom Feb 19 '15 at 23:14
  • @MarkRansom I'm not trying to start an argument here. I don't know the details of `numpy's` implementation, all I know is that `numpy` says the issue is with IEEE's floating point representation. I'm a messenger here, if you really think the documentation should be changed I suggest you open a pull request there. I can post your concern as an update here, but again I'm just the messenger and `numpy's` documentation is my best indication of why `numpy` rounds the way that it does. – Slater Victoroff Feb 19 '15 at 23:22
  • What I'm saying is that in the general case, the documentation is correct that floating point inaccuracies will affect the result. In this specific case, it is not - there is no floating point inaccuracy, and I gave you the reason why. That statement in your answer is misleading for this particular question. – Mark Ransom Feb 19 '15 at 23:32
  • 4
    @Ben You should not have accepted this incorrect answer. These values are exactly representable. The rounder is known as banker's rounding. Google it. As Mark says, this is documented. Slater, you should be aware of the form of values that are exactly representable. – David Heffernan Feb 19 '15 at 23:42
  • @MarkRansom I am very aware, and it's very worth noting that the documentation linked is directly applied to numpy's `rint` explanation. I'm going to edit the answer more fully instead of just adding it as an update, but I still urge you to open a `numpy` PR – Slater Victoroff Feb 19 '15 at 23:51
  • @DavidHeffernan I am extremely aware of values that are exactly representable, but the fact is that `numpy` is based on FORTRAN 77, which significantly pre-dates the IEEE convention. I wouldn't be surprised in the least if there was an awkward point of contact here involving some imprecise bit-hacking. In short, this question is `numpy` specific, and the documentation is unclear, and the `rint` draws directly from the FORTRAN implementation. Unless you are familiar with that piece of the codebase it's impossible to say how intentional this is. – Slater Victoroff Feb 19 '15 at 23:54
  • @SlaterTyranus: The fundamental scheme of devoting a portion of the available bits to the exponent, one bit to the sign, and the rest to the mantissa, has been around for a long, long time, well predating IEEE 754. You can think of IEEE as basically crystallizing and endorsing what was already the consensus representation. In this regard, there's no meaningful difference between Fortran 77 and IEEE 754. – John Y Feb 20 '15 at 02:07
  • @SlaterTyranus: In the passage you quoted, the part that says *"Results may also be surprising..."* refers to two kinds of surprise (judging by the word "also" there). The first kind is surprise that comes from not knowing which rounding rule NumPy uses. If you expected "ties round away from zero", as OP did, then you will be surprised by NumPy's rounding. So the first part of the passage (*"For values exactly halfway between rounded decimal values, Numpy rounds to the nearest even value."*) addresses that. This kind of surprise is completely unrelated to floating point representation. – John Y Feb 20 '15 at 03:15
  • @SlaterTyranus: The second kind of surprise is the surprise *"due to the inexact representation of decimal fractions in the IEEE floating point standard"*. Note that some decimal fractions *are* exactly represented in binary floating point, and these do *not* behave in surprising ways. – John Y Feb 20 '15 at 03:16
  • @SlaterTyranus: I think the note in the Python documentation accompanying its built-in [`round` function](https://docs.python.org/2/library/functions.html#round) illustrates the second kind of surprise well: *The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float.* – John Y Feb 20 '15 at 03:18
  • 4
    Regarding NumPy's `ceil`: This will not do what OP wants. The ceiling of 10.1 and 10.5 are *both* 11.0. – John Y Feb 20 '15 at 03:23
17

This is in fact exactly the rounding specified by the IEEE floating point standard IEEE 754 (1985 and 2008). It is intended to make rounding unbiased. In normal probability theory, a random number between two integers has zero probability of being exactly N + 0.5, so it shouldn't matter how you round it because that case never happens. But in real programs, numbers are not random and N + 0.5 occurs quite often. (In fact, you have to round 0.5 every time a floating point number loses 1 bit of precision!) If you always round 0.5 up to the next largest number, then the average of a bunch rounded numbers is likely to be slightly larger than the average of the unrounded numbers: this bias or drift can have very bad effects on some numerical algorithms and make them inaccurate.

The reason rounding to even is better than rounding to odd is that the last digit is guaranteed to be zero, so if you have to divide by 2 and round again, you don't lose any information at all.

In summary, this kind of rounding is the best that mathematicians have been able to devise, and you should WANT it under most circumstances. Now all we need to do is get schools to start teaching it to children.

9

Numpy rounding does round towards even, but the other rounding modes can be expressed using a combination of operations.

>>> a=np.arange(-4,5)*0.5
>>> a
array([-2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,  1.5,  2. ])
>>> np.floor(a)      # Towards -inf
array([-2., -2., -1., -1.,  0.,  0.,  1.,  1.,  2.])
>>> np.ceil(a)       # Towards +inf
array([-2., -1., -1., -0.,  0.,  1.,  1.,  2.,  2.])
>>> np.trunc(a)      # Towards 0
array([-2., -1., -1., -0.,  0.,  0.,  1.,  1.,  2.])
>>> a+np.copysign(0.5,a)   # Shift away from 0
array([-2.5, -2. , -1.5, -1. ,  0.5,  1. ,  1.5,  2. ,  2.5])
>>> np.trunc(a+np.copysign(0.5,a))   # 0.5 towards higher magnitude round
array([-2., -2., -1., -1.,  0.,  1.,  1.,  2.,  2.])

In general, numbers of the form n.5 can be accurately represented by binary floating point (they are m.1 in binary, as 0.5=2**-1), but calculations expected to reach them might not. For instance, negative powers of ten are not exactly represented:

>>> (0.1).as_integer_ratio()
(3602879701896397, 36028797018963968)
>>> [10**n * 10**-n for n in range(20)]
[1, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
 0.9999999999999999, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Yann Vernier
  • 15,414
  • 2
  • 28
  • 26
5

Numpy uses bankers rounding so .5 is rounded to the nearest even number. If you always want to round .5 up but round .4 down:

np.rint(np.nextafter(a, a+1))

or if you always want to round .5 down and .4 down but .6 up:

np.rint(np.nextafter(a, a-1))

NB this also works with np.around if you want the same logic but not integers.

>>> a = np.array([1, 1.5, 2, 2.5, 3, 3.5])
>>> np.rint(a)
array([1., 2., 2., 2., 3., 4.])
>>> np.rint(np.nextafter(a, a+1))
array([1., 2., 2., 3., 3., 4.])
>>> np.rint(np.nextafter(a, a-1))
array([1., 1., 2., 2., 3., 3.])

What is happening? nextafter gives the next representable number in a direction, so this is enough to push the number off 'exactly' 2.5.

Note this is different to ceil and floor.

>>> np.ceil(a)
array([1., 2., 2., 3., 3., 4.])
>>> np.floor(a)
array([1., 1., 2., 2., 3., 3.])
JSharm
  • 1,117
  • 12
  • 11
  • 2
    `nextafter` will have an advantage of less lost precision compared to adding `0.5` to a very small number. `copysign` may still be relevant. – Yann Vernier Feb 04 '20 at 09:12
4

An answer to you edit:

y = int(np.floor(n + 0.5))
Arko B
  • 49
  • 1
  • 4
    This is simple but seems to work for me. I also tested it on a np.array with : `y = np.floor(arr+0.5).astype(int)` – CharlesG Jun 16 '21 at 13:48
0

The built-in round function seems to do what you want, although it only works on scalars:

def correct_round(x):
    try:
        y = [ round(z) for z in x ]
    except:
        y = round(x)    
    return y

and then to verify:

print correct_round([-2.5,-1.5,-0.5,0.5,1.5,2.5])
> [-3.0, -2.0, -1.0, 1.0, 2.0, 3.0]
Brian
  • 1,988
  • 1
  • 14
  • 29
0

Not sure its the most efficient solution but it works:

signs = np.sign(arr)
tmp = signs * arr
arr = np.floor(tmp + 0.5)
arr = arr * signs
Or Davidi
  • 99
  • 7
0

round half up function for for scalar, list and numpy array:

import numpy as np

def round_half_up(x):
    round_lambda = lambda z: (int(z > 0) - int(z < 0)) * int(abs(z) + 0.5)
    if isinstance(x, (np.ndarray, np.generic)):
        return np.vectorize(round_lambda)(x)
    else:
        return round_lambda(x)
GaloisPlusPlus
  • 403
  • 4
  • 20
0

This worked for me:

def my_round(a):
    return np.round(a)*(a-np.floor(a)!=0.5) + np.ceil(a)*(a-np.floor(a)==0.5)

>>> my_round([0.5, 1.5, 2.5, 3.5])

array([1., 2., 3., 4.])
Julia Meshcheryakova
  • 3,162
  • 3
  • 22
  • 42
mopix
  • 1