12

What is the difference between tf.float16 and tf.bfloat16 as listed in https://www.tensorflow.org/versions/r0.12/api_docs/python/framework/tensor_types ?

Also, what do they mean by "quantized integer"?

JMC
  • 1,723
  • 1
  • 11
  • 20

2 Answers2

25

bfloat16 is a tensorflow-specific format that is different from IEEE's own float16, hence the new name. The b stands for (Google) Brain.

Basically, bfloat16 is a float32 truncated to its first 16 bits. So it has the same 8 bits for exponent, and only 7 bits for mantissa. It is therefore easy to convert from and to float32, and because it has basically the same range as float32, it minimizes the risks of having NaNs or exploding/vanishing gradients when switching from float32.

From the sources:

// Compact 16-bit encoding of floating point numbers. This representation uses
// 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa.  It
// is assumed that floats are in IEEE 754 format so the representation is just
// bits 16-31 of a single precision float.
//
// NOTE: The IEEE floating point standard defines a float16 format that
// is different than this format (it has fewer bits of exponent and more
// bits of mantissa).  We don't use that format here because conversion
// to/from 32-bit floats is more complex for that format, and the
// conversion for this format is very simple.

As for quantized integers, they are designed to replace floating points in trained networks to speed up processing. Basically, they are a sort of fixed point encoding of real numbers, albeit with an operating range that is chosen to represent the observed distribution at any given point of the net.

More on quantization here.

P-Gn
  • 23,115
  • 9
  • 87
  • 104
  • Thanks. in the source it also says: "Because of the existing IEEE float16 type, we do not name our representation "float16" but just use "uint16"." Why uint16? is that likely an error in the doc and it was meant to say bfloat16? – JMC Jul 02 '17 at 18:54
  • 2
    I think they are simply refering to [how bfloat16 is represented internally](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/numeric_types.h#L49). – P-Gn Jul 02 '17 at 18:57
  • Your second link on quantization is broken. – Z boson Nov 05 '18 at 11:47
1

Here is the picture describe the internals of three floating point formats: enter image description here

For more information see BFloat16: The secret to high performance on Cloud TPUs

alexqinbj
  • 1,091
  • 3
  • 13
  • 27