0

I am trying to create a new column in my pandas dataframe that is the result of a basic mathematical equation performed on other columns in the dataset. The problem now is that the values captured in the column are extremely rounded up and does not represent the true values.

2.5364 should not be rounded off to 2.5 and 3.775 should not be rounded off to 3.8

I have tried to declare the denominators as floats in a bid to trick the system to supply values that look like that. ie 12/3.00 should be 4.00 but this is still returning 4.0 instead.

This is currently what I am doing:

normal_load = 3
df['FirstPart_GPA'] = ((df[first_part].sum(axis = 1, skipna = True))/(normal_load*5.00))

I set skipna to true because sometimes a column might not have any value but I still want to be able to calculate the GPA without the system throwing out any errors since any number plus NAN would give NAN.

I am working with a dataframe that looks like this:

dict = {'course1': [15,12],
        'course2': [9,6],
        'course3': [12,15],
        'course4': [15,3],
        'course5': [15,9],
        'course6': [9,12]}

df = pd.DataFrame(dict)

Note that the dataframe I have contains some null values because some courses are electives. Please help me out. I am out of ideas.

  • 1
    _"ie 12/3.00 should be 4.00 but this is still returning 4.0 instead."_ - See https://stackoverflow.com/questions/588004/is-floating-point-math-broken - `4.00` and `4.0` mean exactly the same. You can force to _display_ the numbers with two decimal places, but that applies to ever number in that column. – Caramiriel May 22 '19 at 12:14
  • 1
    What is first_part? – Viktoriya Malyasova May 22 '19 at 12:26
  • Cannot reproduce. The example data that you show are all multiple of 3 so all the results need only one decimal digit. And when I tweeked an number to have a sum which is not a multiple of 3, my system displayed it with 6 decimal positions. – Serge Ballesta May 22 '19 at 12:29
  • Have you tried using the round() function? df['FirstPart_GPA'] = ((df.sum(axis = 1, skipna = True))/(normal_load*5.00)).round(2) Otherwise can you colloborate on the expected output? – Tox May 22 '19 at 12:33

3 Answers3

1

You can add float formatting something like this:

result= "%0.2f" % your_calc_result

Example using this code:

dict = {'course1': [15,12],
        'course2': [9,6],
        'course3': [12,15],
        'course4': [15,3],
        'course5': [15,9],
        'course6': [9,12]}
df = pd.DataFrame(dict)
normal_load = 3.0
result=[]
for i in range(len(df.index)):
    result.append("%0.2f" % (float(df.loc[i].sum())/(normal_load*5.00)))
df['FirstPart_GPA']=result

Output:

   course1  course2  course3  course4  course5  course6 FirstPart_GPA
0       15        9       12       15       15        9          5.00
1       12        6       15        3        9       12          3.80
ALFAFA
  • 598
  • 3
  • 7
  • Thank you mf Al Fafa for your input but I need the result to be saved as a new column in the dataframe. For example, the output from your code should be in the column called "FirstPart_GPA" – MsTechy_J007 May 22 '19 at 12:43
  • now, it will add a 'FirstPart_GPA' column to DataFrame and display the result of calculation – ALFAFA May 22 '19 at 12:51
  • I get it now. I would try this because it looks more wholesome than my previous code and I bet I would need to implement something like this in the future. Thank you so much for your time. – MsTechy_J007 May 22 '19 at 17:23
1

You have not defined the first_part variable in your code, so I am going to assume it is some subset of dataframe columns, e.g:

first_part=['course1', 'course2', 'course3']

All of the numbers in your dataframe are integer multiples of 3, therefore when you sum up any of them and divide by 15, you will always get a decimal number with no more than 1 digit after the decimal dot. Your values are not rounded up, they are exact.

To display numbers with two digits after the decimal dot, add a line:

pd.options.display.float_format = '{:,.2f}'.format

Now

df['FirstPart_GPA'] = ((df[first_part].sum(axis = 1, skipna = True))/(normal_load*5.00))
df
course1 course2 course3 course4 course5 course6 FirstPart_GPA
0   15  9   12  15  15  9   2.40
1   12  6   15  3   9   12  2.20
Viktoriya Malyasova
  • 1,343
  • 1
  • 11
  • 25
0

OMG! I now see what the problem is. I just threw my file into excel and did the calculation and it turns out that the code is fine. I am sorry I took any of your time and at the same time I appreciate your quick response.

I always assumed that GPAs would have lots of decimals but the code uses a 5-point grading system which means that if a student has an A in a course that has a course load of 3, she would have scored 15 points.

A student has to take 5 courses per semester. All 5 courses have a load of 3. This means that all 5 courses = 15.

So because the possible values a student can have are mostly multiples of 3 (0,3,6,9,12,15), when we divide the sum of all his units across all 5 courses by 15, 3 would always go through it ie 3+12+12+3+9/15 = 13/5

5 is so unproblematic and it would mostly not spill over in extra decimals unlike 10/3 that keeps giving me recursive 3s in the decimal part, 5 is co-operative. Therefore 13/5 = 2.6