1

When using Numba's @jit with Numpy's float32 data type I'm getting ?truncation? issues. It's largely noise since it's far past the decimals I care about - around the 7th or 8th place - but it'd still be good to know what's going on and if I can fix it up.

I have to use the float32 data type to conserve memory, as an aside!

Here's the code I'm using as a test:

import numpy as np
from test_numba import test_numba

np.random.seed(seed=1774);
number = 150;
inArray = np.round(np.float32((np.random.rand(number)-.5)*2),4); #set up a float32 with 4 decimal places
numbaGet = test_numba(inArray); #run it through
print("Get:\t"+str(numbaGet)+" Type: "+str(type(numbaGet)));
print("Want:\t"+str(np.mean(inArray))+" Type: "+str(type(np.mean(inArray)))); #compare to expected

Combined with the following function

import numpy as np
from numba import jit #, float32

@jit(nopython=True) #nopython=True, nogil=True, parallel=True, cache=True , nogil=True, parallel=True #float32(float32),
def test_numba(inArray):

    #outArray = np.float32(np.mean(inArray)); #forcing float32 did not change it
    outArray = np.mean(inArray);

    return outArray;

The output from this is:

Get:    0.0982406809926033 Type: <class 'float'>
Want:   0.09824067 Type: <class 'numpy.float32'>

And that seems to point to Numba is making it a Python float class (float64 as far as I understand it) and doing math and then somehow losing precision.

If I switch to float64 the difference is greatly minimized.

Get:    0.09824066666666667 Type: <class 'float'>
Want:   0.09824066666666668 Type: <class 'numpy.float64'>

Not sure what I'm doing wrong with this. Again, in my case it's an ignorable issue (starting from 4 decimal places) but still would like to know why!

user2403531
  • 688
  • 6
  • 15
  • "It's largely noise since it's far past the decimals I care about - around the 7th or 8th place" - `float32` doesn't have that kind of precision. – user2357112 Jan 18 '19 at 22:42
  • What makes you think Numba is *losing* precision? – user2357112 Jan 18 '19 at 22:43
  • `float32` has 7 decimal digits of precision as I understood it from reading things about it [wiki](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) [stacki](https://stackoverflow.com/questions/13542944/how-many-significant-digits-have-floats-and-doubles-in-java) - so it's around there. Additionally, for the loss of precision, the Numba return for the `float32` case is `68` but the answer seems to be actually `66...` so a 6 or a 7 ending can be correct depending on rounding or truncating but an 8 isn't quite it. – user2403531 Jan 18 '19 at 22:48

1 Answers1

4

The reason is, that numba doesn't use np.mean but replaces it by/rolls out its own version:

def array_mean_impl(arr):
    # Can't use the naive `arr.sum() / arr.size`, as it would return
    # a wrong result on integer sum overflow.
    c = zero
    for v in np.nditer(arr):
        c += v.item()
    return c / arr.size

Some time ago I gave an answer to a very similar question about differences between numpy.mean and pandas.mean (which uses bottleneck). So everything said there applies also here, please take a look at it for more details, here in a nutshell:

  • Naive summation, as used by numba, has an error of O(n), where n is the number of summands.
  • Numpy uses an approach similar to pairwise-summation, which is more precise with error O(log(n)).
  • The differences are obvious for float32 but less obvious for float64 albeit the same problem is still present.
ead
  • 32,758
  • 6
  • 90
  • 153
  • Very interesting - thank you! That also explains why np.sum(inArray)/inArray.size gives the same value as Numba's solution did (when I tested it) - it's the same method and suffers from the same issue. That's why I was heavily after unit types being the culprit, I thought the math was OK - hah! I'll ask the Numba github if they can implement this mean, and thanks again for the clear insight! – user2403531 Jan 22 '19 at 00:19