Why does numpy.float16 break the OpenBlas/Atlas functionalities?

Question

Ok, I know float16 is not a real primitive type, but it's simulated by Python/numpy. However, the question is: if that exists and Python allows to use it in arrays multiplication using the numpy.dot() function, why doesn't OpenBlas (or ATLAS) properly work? I mean, the multiplication works, but the parallel computation doesn't. Or again, in a different way (better in my opinion), why does Python/numpy allow to use float16 if then we cannot exploit the advanced functionalities offered by OpenBlas/ATLAS?

I have observed that `scipy.ndimage.gaussian_filter1d` (and functions that use it) break on `np.float16`. Not sure how this data type is encoded, but yours doesn't seem to be the only problem with it. — eickenberg, Jul 05 '14 at 18:11

score 16 · Accepted Answer · edited Dec 03 '14 at 12:34

16

Numpy float16 is a strange and possibly evil beast. It is an IEEE 754 half-precision floating point number with 1-bit of sign, 5 bits of exponent and 10 bits of mantissa.

While it is a standard floating point number, it is a newcomer and not in wide use. Some GPUs support it, but the hardware support is not common in CPUs. Newer processors have commands to convert between 16-bit and 32-bit floats, but no support to use it directly in mathematical operations. Due to this and due to the lack of suitable data types in common lower level languages, the 16-bit float is slower to use than its 32-bit counterpart.

Only few tools support it. Usually, the 16-bit float is regarded as a storage format which is then converted into a 32-bit float before use.

Some benchmarks:

In [60]: r=random.random(1000000).astype('float32')

In [61]: %timeit r*r
1000 loops, best of 3: 435 us per loop

In [62]: r=random.random(1000000).astype('float16')

In [63]: %timeit r*r
100 loops, best of 3: 10.9 ms per loop

As a general use, do not use it for anything else than as compressed storage. and even then be aware of the compromise:

In [72]: array([3001], dtype='float16') - array([3000], dtype='float16')
Out[72]: array([ 0.], dtype=float32)

edited Dec 03 '14 at 12:34

Mike Graham

73,987
14
101
130

answered Jul 05 '14 at 20:38

DrV

22,637
7
60
72

2

"The 16-bit float is thus not a standard format". This is false. It's the IEEE 754 2008 binary16 type, very much a standard format. – Mark Dickinson Jul 05 '14 at 20:55
1

See http://docs.scipy.org/doc/numpy/user/basics.types.html, where it's described explicitly: "sign bit, 5 bits exponent, 10 bits mantissa". – Mark Dickinson Jul 05 '14 at 20:56
Ooops. I'll revise my answer! Thanks. – DrV Jul 05 '14 at 21:13
@MarkDickinson: No good to leave plain wrong information in sight... And now I learnt a lot, as I dug a bit into the background of the half-precision float. So, I am both embarrassed (earlier ignorance) and happy (new knowledge). And owe you a warm thank you. – DrV Jul 05 '14 at 21:31
Yes, indeed this problem is related to another my previous problem about arrays multiplication (see this: http://stackoverflow.com/questions/24490760/multiplying-very-large-2d-array-in-python). Although I configured my system to properly work with Atlas (first) and OpenBlas (then), that multiplication doesn't exploit the parallel computation if I set array values as `float16` types. Sincerely, that drove me crazy... eventually I discovered that was a problem of primitive type. I think the numpy's documentation should report what can work and what cannot. – redcrow Jul 06 '14 at 15:38
@DrV: anyway, thank you for your answer and contribution (the same for Mark Dickinson). I'll leave this question open for a while, just to see if there are other contributions. – redcrow Jul 06 '14 at 15:39

Why does numpy.float16 break the OpenBlas/Atlas functionalities?

1 Answers1