1

According to the documentation, Python rounds values toward the even choice if the upper and lower rounded values are equally close to the original number.

I want to round values in my pandas.DataFrame such that 0.5 is always rounded up.

A way to fix it would be to use the decimal module with Decimal datatype as described here: How to properly round up half float numbers in Python?

import pandas as pd

if __name__ == "__main__":
    df = pd.DataFrame(data=[0.5, 1.499999, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5], columns=["orig"])
    df["round"] = df.round()
    print(df)

which outputs:

       orig  round
0  0.500000    0.0
1  1.499999    1.0
2  1.500000    2.0
3  2.500000    2.0
4  3.500000    4.0
5  4.500000    4.0
6  5.500000    6.0
7  6.500000    6.0

I tried to do something like:

df["round"] = df["orig"].values.astype(Decimal).round()

but this does not work. Is there a simple and readable solution to ensure that .5 is always rounded up?

EDIT

I'm not sure if the links in the comments answer the question. The solution presented in the links is casting every float to a string and are manipulating the strings which seems ridiculous for large DataFrames. (and is very hard to read/understand). I was hoping there is a simple function to use like in the decimal package

Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76
user7431005
  • 3,899
  • 4
  • 22
  • 49
  • Look [here](https://stackoverflow.com/questions/31818050/round-number-to-nearest-integer) –  Jun 04 '21 at 13:04
  • Possible duplicate of [this](https://stackoverflow.com/questions/31818050/round-number-to-nearest-integer), could a mod please close this question. –  Jun 04 '21 at 13:05
  • Can you post some example on how this solves the exact issue I have with Dataframes rounding up on .5 – user7431005 Jun 04 '21 at 13:24

2 Answers2

1

You can add some tiny value to orig when the decimal is 0.5. That guarantees that any integer + 0.5 will always round up to the next integer.

import numpy as np
df['round_up'] = np.round(np.where(df['orig'] % 1 == 0.5,
                                   df['orig'] + 0.1,
                                   df['orig']))
print(df)
       orig  round_up
0  0.500000       1.0
1  1.499999       2.0
2  1.500000       2.0
3  2.500000       3.0
4  3.500000       4.0
5  4.500000       5.0
6  5.500000       6.0
7  6.500000       7.0
Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76
  • I liked your idea with the ``%` operator. I did end up using an approach inspired by your answer that is: `df["round"] = np.floor(df["orig"])` followed by `df.loc[df["orig"] % 1 >= 0.5, "round"] += 1` – user7431005 Jun 04 '21 at 13:53
  • even nicer: `df["group"] = np.where(df["orig"] % 1 < 0.5, np.floor(df["orig"]), np.ceil(df["orig"]))` – user7431005 Jun 04 '21 at 14:05
  • Great! Yes, that last one is great. You could also try `df['orig'] - np.floor(df['orig'])` to isolate the decimal. This is just a wild guess though, but I have a hunch that `%` may be slower than `-`. – Arturo Sbr Jun 04 '21 at 14:12
1

Using the decimal module, you could do

import decimal
df = pd.DataFrame(data=[0.5, 1.499999, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5], columns=["orig"])

df.orig = df.orig.apply(
  lambda x: decimal.Decimal(x).to_integral_value(rounding=decimal.ROUND_HALF_UP)
)
valentin
  • 570
  • 7
  • 12