Bugs of Python "num" or "np.sum"?

Question

I am working on the following code snippets to compute the similarity of two images:

import cv2, sys                                                                          [5/981]
import numpy as np

def compute_hisgram(path):
    hog = cv2.HOGDescriptor()
    im = cv2.imread(path)
    img_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
    h = hog.compute(im)

    return h

def histogram_intersection(h1, h2):
    # minima = np.minimum(h1, h2)
    # intersection = np.true_divide(np.sum(minima), np.sum(h2))
    #return intersection
    return cv2.compareHist(h1, h2, cv2.HISTCMP_INTERSECT)


h1 = compute_hisgram(sys.argv[1])
h2 = compute_hisgram(sys.argv[2])

h12 = histogram_intersection(h1, h2)


# to normalize, we need to divide by the original image.
print (h12/np.sum(h2))

And when executing the above code with two images as the input, it outputs a value that is close to 1.0, which seems quite normal.

python3 t.py image1.png image2.png
0.9932124283243112

However, to my very much surprise, when the last statement is written in the following way:

print (h12/sum(h2))

The output is different, and it is a number larger than 1! And the code is just much slower than before.

python3 t.py image1.png image2.png
[1.1126189]

Is it a bug of Python sum function? Or am I missed anything here? Thanks.

======== update

Here is the output of print (h2):

[[0.0924307 ]
 [0.05680538]
 [0.07150667]
 ...
 [0.10983132]
 [0.17328948]
 [0.0688285 ]]

And h12:

4517725.058263534

How about `sum` and `np.sum` of `h2`? I can explain the speed difference but not the value difference. — Mad Physicist, Jul 26 '19 at 07:36
You can get the explanation of performance from https://stackoverflow.com/q/10922231/4050015. But the value should not differ. — Anupam Bera, Jul 26 '19 at 08:06

Stef · Accepted Answer · 2019-07-26T08:15:05.453

These functions do the summing differently. Example:

a = np.full((9000,), 1/9000)
sum(a)
# 0.9999999999998027
np.sum(a)
# 0.9999999999999999
math.fsum(a)
# 1.0

So np.sum() gives the more accurate result than sum().

See fsum for some explanations:

math.fsum(iterable)

Return an accurate floating point sum of values in the iterable. Avoids loss of precision by tracking multiple intermediate partial The algorithm’s accuracy depends on IEEE-754 arithmetic guarantees and the typical case where the rounding mode is half-even. On some non-Windows builds, the underlying C library uses extended precision addition and may occasionally double-round an intermediate sum causing it to be off in its least significant bit.

For further discussion and two alternative approaches, see the ASPN cookbook recipes for accurate floating point summation.

You may also want to have a look at accupy. There are some very informative charts on accuracy of different summation methods.

The following is a really pathological example (take from here), obviously the accurate result is exactly 20000:

a = [1, 1e100, 1, -1e100] * 10000
math.fsum(a), np.sum(a), sum(a)
# (20000.0, 0.0, 0.0)

Bugs of Python "num" or "np.sum"?

1 Answers1