I am trying to calculate variances via np.std(array,ddof = 0). The problem emerges if I happen to have a long delta array, i.e., all values in the array are the same. Instead of returning std = 0, it gives some small value which in turn causes further estimation errors. The mean is returned correctly... Example:
np.std([0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],ddof = 0)
gives 1.80411241502e-16
but
np.std([0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],ddof = 0)
gives std = 0
Is there a way to overcome this except for now checking for uniqueness of data for every iteration without calculating std at all?
Thanks
P.S. Following marking as duplication of Is floating point math broken?, copy-pasting the reply by @kxr on why it's a different question:
"The current duplicate marking is wrong. Its not just about simple float comparison, but about internal aggregation of small errors for near-zero outcome by using the np.std on long arrays - as the questioner indicated extra. Compare e.g. >>> np.std([0.1, 0.1, 0.1, 0.1, 0.1, 0.1]*200000) -> 2.0808632594793153e-12
. So he can e.g. solve by: >>> mean = a.mean(); xmean = round(mean, int(-log10(mean)+9)); std = np.sqrt(((a - xmean) ** 2).sum()/ a.size)
"
The problem certainly starts with floating representation but it does not stop there. @kxr - I appreciate the comment and the example