0

I have an array and I would like to calculate the sum of the elements column wise (column_sum) and divide the column elements with the column_sum, so that after division, the sum of column elements would be equal to 1.

Code:

import numpy as np

# sample array
arr = np.array([[0.045, 0.531, 0.53],
              [0.968, 0.051, 0.013],
              [0.653, 0.304, 0.332],
              [0.065, 0.123, 0.033], 
              [0.035, 0.328, 0.333], 
              [0.065, 0.330, 0.333]], np.float32)

print("before\n", arr)
arr_sum = np.array(arr.sum(axis=0),dtype=np.float32)
arr = arr / arr_sum
print("\nafter\n",arr)
print("\ncolumn_sum after division\n")
print(np.array(arr.sum(axis=0),dtype=np.float32))

Here I am taking the column_sum and dividing each column elements with their corresponding column_sum.

The above code is giving me an output like this:

before

[[0.045 0.531 0.53 ]
 [0.968 0.051 0.013]
 [0.653 0.304 0.332]
 [0.065 0.123 0.033]
 [0.035 0.328 0.333]
 [0.065 0.33  0.333]]

after

[[0.02457674 0.31853628 0.33672175]
 [0.5286729  0.03059388 0.00825921]
 [0.35663575 0.1823635  0.21092758]
 [0.03549973 0.07378524 0.02096569]
 [0.01911524 0.19676064 0.21156292]
 [0.03549973 0.19796039 0.21156292]]

column_sum after division

[1.         0.99999994 1.0000001 ]

but the actual column_sum have to be precisely 1 (sum of probabilities) or give an output like this:

[1.    1.    1.] 

This is what happens when using the datatype float32. The elements should have to be the type numpy.float32 and the sum should have to 1. Is there any way to overcome this?

N G
  • 1
  • 2

1 Answers1

3

Welcome to floating point arithmetic. Remember that the number 0.045 cannot be represented exactly in binary. It is an infinitely repeating value, so what you get in the registers is an approximation. As you do more math, those approximation errors accumulate. You never get exactly 1.0. If you need to print them rounded, then do the rounding when you print

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30