How can I deal with Precision problem in Python Numpy?

Question

There is a function：

def get_acc(real_dpt, real_avg, pre_dpt, pre_avg, axis):
    delta_Rf = pre_dpt/pre_avg
    delta_Rf_avg = pre_avg

    delta_Ro = real_dpt / real_avg
    delta_Ro_avg = real_avg

    pre = delta_Rf - delta_Rf_avg
    obs = delta_Ro - delta_Ro_avg
    d1 = np.sum(pre*obs, axis=axis)
    d2 = (np.sum(pre**2, axis=axis)*np.sum(obs**2, axis=axis))**0.5
    return d1/d2

pre:

obs_DPT, obs_AVG, cwrf_DPT, cwrf_AVG ，The same ndarray shape is passed in，

Shape = (29, 1452, 5)， dtype = np.float32

I got

result1 = get_acc(obs_DPT, obs_AVG, cwrf_DPT, cwrf_AVG, axis=1)
# result1.shape = (29, 5)  array

There is no problem

Then，I got

result2 = get_acc(obs_DPT[i, :, 2:3], obs_AVG[i, :, 2:3], cwrf_DPT[i, :, 2:3], cwrf_AVG[i, :, 2:3], axis=0) 
# i is    0, 1, 2, 3,...,28
# result2.shape=(1,)

Now，I make result3 = result1[i, 2:3]

result3 = result1[i, 2:3]   
# result3.shape=(1,)

Then I make a judgment

if result2[0] == result3[0] :
       print("i={}, resul2={}, resul3={}".format(i, resul2[0],  resu3[0]))

For 28 i, Only the following are equal

i=4, resul2=0.9601920247077942, resul3=0.9601920247077942
i=21, resul2=0.966850221157074, resul3=0.966850221157074
i=27, resul2=0.9409129023551941, resul3=0.9409129023551941

The others are not equal

i=0, resul2=0.9641021490097046, resul3=0.9641022682189941
i=1, resul2=0.937653124332428, resul3=0.9376530647277832
i=2, resul2=0.9460444450378418, resul3=0.9460448026657104
i=3, resul2=0.9394290447235107, resul3=0.9394280314445496
i=5, resul2=0.9721810221672058, resul3=0.9721801280975342
i=6, resul2=0.9628128409385681, resul3=0.9628139734268188
i=7, resul2=0.9723774790763855, resul3=0.9723766446113586
i=8, resul2=0.9653074741363525, resul3=0.9653091430664062
i=9, resul2=0.9601299166679382, resul3=0.9601304531097412
i=10, resul2=0.9747092127799988, resul3=0.9747100472450256
i=11, resul2=0.9554705023765564, resul3=0.9554708003997803
i=12, resul2=0.9655697345733643, resul3=0.9655706286430359
i=13, resul2=0.9721916317939758, resul3=0.9721908569335938
i=14, resul2=0.9664390683174133, resul3=0.9664396643638611
i=15, resul2=0.949366569519043, resul3=0.9493645429611206
i=16, resul2=0.9490151405334473, resul3=0.9490156769752502
i=17, resul2=0.9765706062316895, resul3=0.9765700697898865
i=18, resul2=0.9661588668823242, resul3=0.9661590456962585
i=19, resul2=0.9475210905075073, resul3=0.9475213885307312
i=20, resul2=0.9426926374435425, resul3=0.9426922798156738
i=22, resul2=0.9612534642219543, resul3=0.9612541198730469
i=23, resul2=0.9686486721038818, resul3=0.9686492681503296
i=24, resul2=0.9685435891151428, resul3=0.968543291091919
i=25, resul2=0.9744826555252075, resul3=0.9744828343391418
i=26, resul2=0.9382848739624023, resul3=0.9382855892181396
i=28, resul2=0.9597892761230469, resul3=0.959788978099823

In addition, I got another problem

The array dtype = np.float32

when I make array dtype = np.float64

I didn't get an equal result2 and result3

I don't know if I make it clear， How can I solve this problem

Thanks a lots

You are dealing with floating point numbers. Expecting exact equality is always a bad idea. — Frank Yellin, Oct 24 '20 at 06:00
Can't you show the results in an easier to read manner, for example `result2-result3`? — hpaulj, Oct 24 '20 at 06:03
Welcome to the uncertain world of floating point numbers. Checking for equality with floats is not a good idea, instead define a variable epsilon to some small value, maybe 1e-8 or something, and when checking for equality, just see if the difference is smaller than epsilon. This is the approach recommended in many guides and tutorials. — EnderShadow8, Oct 24 '20 at 06:24
input 0.1+0.2 into the shell and see the output to see why float equality is a bad idea. — EnderShadow8, Oct 24 '20 at 06:25
Just perform each step of your function side by side with the inputs of different shape and see which step causes the difference. This would help you to ask a more specific and meaningful question. — Dima Chubarov, Oct 24 '20 at 06:33
Just the classic one and the most famous. 0.1+0.2==0.30000000000000004 — EnderShadow8, Oct 24 '20 at 06:44
Possibly related question https://stackoverflow.com/q/10371857/1328439 — Dima Chubarov, Oct 24 '20 at 07:42

Dima Chubarov · Accepted Answer · 2020-10-24T10:04:32.230

Your function has a few steps and it might not be immediately clear which one is causing the difference.

If your perform your function with the full array and a slice of it side by side and compare the results after each step, you would notice that the first step where there is a difference is the summation.

Consider a simpler test:

import numpy as np

test = np.random.rand(29, 1452, 5)
sum1 = np.sum(test[0, :, 2:3])
sum2 = np.sum(test, axis=1)[0, 2:3]
print(np.max(abs(sum1 - sum2)))

It is likely that this example would already print a number different from zero.

Apparently the core of the problem is in the order of summation that is different depending on the shape of the input. And this causes accumulation of floating point rounding error.

For further reading consider "What every computer scientist should know about floating point arithmetic" by David Goldberg.

Update: on a following note the example can be further simplified. However it is important that the dimension along which np.sum is applied is not the last one.

import numpy as np

test = np.random.rand(1452, 5)
sum1 = np.sum(test[:, 2])
sum2 = np.sum(test, axis=0)[2]
print(np.max(abs(sum1 - sum2)))

Also there is nothing magical in the number 1452. One can frequently observe nonzero difference for arrays of much smaller size.

How can I deal with Precision problem in Python Numpy?

1 Answers1