0

A few weeks ago I was provided with a really useful piece of code to help me to round my health data outcomes to 1dp, using the principle of rounding 1.25 to 1.3 (Python 3 doesn't do this as standard). I've unfortunately come across an instance where my newly defined rounding rule isn't working though! Can someone please suggest an amendment to the my_round below please? It's so frustrating as I thought I'd got the perfect solution here.

import pandas as pd
import math
raw_data = {'AreaCode' : ['101', '101', '101'],
            'Disaggregation' : ['1864', '65Over', 'Total'],
            'Numerator' : [19.0, 82.0, 101.0],
            'Denominator' : [24.0, 160.0, 184.0]}

Data = pd.DataFrame(raw_data, columns = ['AreaCode', 'Disaggregation', 'Numerator', 'Denominator'])

def my_round(n, ndigits=1):
    try:
       part = n * 10 ** ndigits
       delta = part - int(part)
       # always round "away from 0"
       if delta >= 0.5 or -0.5 < delta <= 0:
           part = math.ceil(part)
       else:
           part = math.floor(part)
       val =  part/(10 ** ndigits)
    except ValueError:
       val = np.nan
    return val

Data['Outcome'] = (Data['Numerator'] / (Data['Denominator'])*100).apply(my_round)

When all this is run you can see that the 65Over outcome has rounded to 51.2 when the calculation is 82.0 / 160.0 * 100 = 51.250000. I have to be certain such instances will output as 51.3 in my published data.

nnn1234
  • 67
  • 6

1 Answers1

0

You are probably encountering 'half-even rounding' or 'bankers rounding'. 51.25 will round to 51.2, 51.35 will round to 51.4. It's very common in computing to prevent aggregation errors. You can customize this, though. See How to properly round up half float numbers in Python?

In addition, your code is likely not working because it requires exact-equality between floats for the delta >= 0.5 bit to work when delta is exactly 0.5. The division by 10 is probably giving you loss-of-precision such that you can't subsequently get exact equality.

Edit: Your issue is that the original calculation, 82.0 / 160.0 * 100, gives you an answer that is no longer exactly 51.25 due to loss-of-precision due to floating-point division/multiplication. Multiplying first such that numbers remain integers will help. But really, you need to use Decimal (https://docs.python.org/3/library/decimal.html) module to ensure that base-ten arithmetic gives you 'natural answers'.

Jason Crease
  • 1,896
  • 17
  • 17
  • Can you offer any pointers on what syntax will be needed to apply this decimal principle to my 'Outcome' column? The context provided in the decimal module notes is just using keyed-in numbers – nnn1234 Feb 21 '20 at 10:05
  • What's key here is that 82.0/160.0*100 != (82.0*100/160.0). This is due to how base-2 floating point arithmetic works. When doing multiplications and divisions, you often subtly lose precision such that . eg Try: print (my_round(82.0/160.0 * 100.0) == my_round((82.0 * 100.0)/160.0)). You might find that multiplying by 100 first like this helps give yout he answer your want. – Jason Crease Feb 21 '20 at 10:46
  • `print (my_round(51.25)) # gives 51.3 print (my_round(82.0 / 160.0 * 100.0)) # gives 51.2 print (my_round((82.0 * 100.0)/160.0)) # gives 51.3 ` – Jason Crease Feb 21 '20 at 10:47
  • As for an example of using Decimal, try this: `print ((Decimal(82.0) / Decimal(160.0) * Decimal(100)).quantize(Decimal('.1'), rounding=ROUND_HALF_DOWN)) # prints 51.2 print ((Decimal(82.0) / Decimal(160.0) * Decimal(100)).quantize(Decimal('.1'), rounding=ROUND_HALF_UP)) # prints 51.3` – Jason Crease Feb 21 '20 at 11:03
  • So in my example above would the syntax be something like ((Decimal([Data['Numerator']]) / Decimal([Data['Denominator']]) * Decimal(100)).quantize(Decimal('.1'), rounding=ROUND_HALF_UP)) – nnn1234 Feb 21 '20 at 12:06
  • You have a few extra square braces around Data[X], but roughly, yes, that's correct. – Jason Crease Feb 21 '20 at 13:11