16-bit floats and GL_HALF_FLOAT

Question

I'm looking for/writing a C++ implementation of a 16-bit floating point number to use with OpenGL vertex buffers (texture coordinates, normals, etc). Here are my requirements so far:

Must be 16-bit (obviously).
Must be able to be uploaded to an OpenGL vertex buffer using GL_HALF_FLOAT.
Must be able to represent numbers beyond -1.0 - +1.0 (Otherwise I would just use GL_SHORT normalized).
Must be able to convert to and from a normal 32-bit float.
Arithmetic operations do not matter - I only care about storage.
Speed is not a primary concern, but correctness is.

Here's what I have so far for an interface:

class half
{
public:
    half(void) : data(0) {}
    half(const half& h) : data(h.data) {}
    half(const unsigned short& s) : data(s) {}
    half(const float& f) : data(fromFloat(f)) {}
    half(const double& d) : data(fromDouble(d)) {}

    inline operator const float() { return toFloat(data); }
    inline operator const double() { return toDouble(data); }

    inline const half operator=(const float& rhs) { data = fromFloat(rhs); return *this; }
    inline const half operator=(const double& rhs) { data = fromDouble(rhs); return *this; }

private:
    unsigned short data;

    static unsigned short fromFloat(float f);
    static float toFloat(short h);

    inline static unsigned short fromDouble(double d) { return fromFloat((float)d); }
    inline static double toDouble(short h) { return (double)toFloat(h); }
};

std::ostream& operator<<(std::ostream& os, half h) { os << (float)h; }
std::istream& operator>>(std::istream& is, half& h) { float f; is >> f; h = f; }

Ultimately, the real meat of the class lies in the toFloat() and fromFloat() functions, which is what I need help with. I've been able to find quite a few examples of 16-bit float implementations, but none of them mention whether or not they can be uploaded to OpenGL or not.

What are some concerns I should be aware of when uploading a 16-bit float to OpenGL? Is there a half-float implementation that specifically addresses these concerns?

EDIT: By popular demand, here is how my vertex data is generated, uploaded, and rendered.

Here is how the data is defined within the WireCubeEntity class:

VertexHalf vertices[8] = {
        vec3(-1.0f, -1.0f, -1.0f),
        vec3(1.0f, -1.0f, -1.0f),
        vec3(1.0f, 1.0f, -1.0f),
        vec3(-1.0f, 1.0f, -1.0f),
        vec3(-1.0f, -1.0f, 1.0f),
        vec3(1.0f, -1.0f, 1.0f),
        vec3(1.0f, 1.0f, 1.0f),
        vec3(-1.0f, 1.0f, 1.0f)
    };

    unsigned char indices[24] = {
        0, 1,
        1, 2,
        2, 3,
        3, 0,
        4, 5,
        5, 6,
        6, 7,
        7, 4,
        0, 4,
        1, 5,
        2, 6,
        3, 7
    };

    va.load(GL_LINES, VF_BASICHALF, 8, vertices, GL_UNSIGNED_BYTE, 24, indices);

where va is an instance of VertexArray. va.load is defined as:

MappedBuffers VertexArray::load(GLenum primitive, VertexFormat vertexFormat, unsigned int vertexCount, void* vertices,
                                                  GLenum indexFormat, unsigned int indexCount, void* indices)
{
    MappedBuffers ret;

    /* Check for invalid primitive types */
    if (primitive > GL_TRIANGLE_FAN)
    {
        error("in VertexFormat::load():\n");
        errormore("Invalid enum '%i' passed to 'primitive'.\n", primitive);
        return ret;
    }

    /* Clean up existing data */
    clean();

    /* Set up Vertex Array Object */
    glGenVertexArrays(1, &vao);
    bindArray();

    /* Create Vertex Buffer Object */
    glGenBuffers(1, &vbo);
    glBindBuffer(GL_ARRAY_BUFFER, vbo);
    glBufferData(GL_ARRAY_BUFFER, vertexSize(vertexFormat) * vertexCount, vertices, GL_STATIC_DRAW);
    if (!vertices) ret.vmap = glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);

    /* Save variables for later usage */
    prim = primitive;
    vformat = vertexFormat;
    vcount = vertexCount;

    /* If we've been given index data, handle it */
    if (indexSize(indexFormat) != 0)
    {
        glGenBuffers(1, &ibo);
        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
        glBufferData(GL_ELEMENT_ARRAY_BUFFER, indexSize(indexFormat) * indexCount, indices, GL_STATIC_DRAW);
        if (!indices) ret.imap = glMapBuffer(GL_ELEMENT_ARRAY_BUFFER, GL_WRITE_ONLY);

        iformat = indexFormat;
        icount = indexCount;
    }

    /* Handle the vertex format */
    switch (vformat)
    {
    case VF_BASIC:
        /* VF_BASIC only has a position - a 3-component float vector */
        glEnableVertexAttribArray(0);
        glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, (void*)0);
        break;
    case VF_32:
        /* VF_32 has 3 components for position, 2 for texture coordinates, and 3 for a normal.
        Position is at offset 0, TextureCoordinate is at offset 12, and Normal is at offset 20 */
        glEnableVertexAttribArray(0);
        glEnableVertexAttribArray(1);
        glEnableVertexAttribArray(2);
        glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, vertexSize(VF_32), (void*)0);
        glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, vertexSize(VF_32), (void*)12);
        glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, vertexSize(VF_32), (void*)20);
        break;
    case VF_BASICHALF:
        /* VF_BASICHALF is very similar to VF_BASIC, except using half-floats instead of floats. */
        glEnableVertexAttribArray(0);
        glVertexAttribPointer(0, 3, GL_HALF_FLOAT, GL_FALSE, 0, (void*)0);
        break;
    case VF_WITHTANGENTS:
        /* VF_WITHTANGENTS is similar to VF_32, but with additional components for a Tangent. */
        /* Tangent is at offset 32 */
        glEnableVertexAttribArray(0);
        glEnableVertexAttribArray(1);
        glEnableVertexAttribArray(2);
        glEnableVertexAttribArray(3);
        glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, vertexSize(VF_WITHTANGENTS), (void*)0);
        glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, vertexSize(VF_WITHTANGENTS), (void*)12);
        glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, vertexSize(VF_WITHTANGENTS), (void*)20);
        glVertexAttribPointer(3, 3, GL_FLOAT, GL_FALSE, vertexSize(VF_WITHTANGENTS), (void*)32);
        break;
    default:
        error("In VertexFormat::load():\n");
        errormore("Invalid enum '%i' passed to vertexFormat.\n", (int)vformat);
        clean();
        return MappedBuffers();
    }

    /* Unbind the vertex array */
    unbindArray();

    if (vertices) ready = true;

    return ret;
}

I'ts a pretty heavy function, I know. MappedBuffers is simply a struct that contains 2 pointers so that if I pass NULL data into VertexArray::load(), I can use the pointers to load the data directly from file into buffers (possibly from another thread). vertexSize is a function that returns the sizeof() of whichever vertex format I pass in, or 0 for an invalid format.

The VertexHalf struct is:

struct VertexHalf
{
    VertexHalf(void) {}
    VertexHalf(vec3 _pos) :x(_pos.x), y(_pos.y), z(_pos.z) {}
    VertexHalf(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {}

    half x, y, z, padding;
};

And finally the data is rendered using the VertexArray we loaded earlier:

void VertexArray::draw(void)
{
    if (ready == false)
        return;

    /* Bind our vertex array */
    bindArray();

    /* Draw it's contents */
    if (ibo == 0)
        glDrawArrays(prim, 0, vcount);
    else
        glDrawElements(prim, icount, iformat, NULL);

    unbindArray();
}

This might be worth checking out: http://half.sourceforge.net/ (disclaimer: it's from me). It is IEEE/OpenGL-conformant, supports all arithmetics and conversions and strives for both performance and streamlined integration into the existing C++ infrastructure (with possible C++11 support where feasible). It should be perfectly able to be up/downloaded to/from OpenGL right away on any reasonable system. — Christian Rau, Mar 10 '14 at 23:21
The format that OpenGL expects is described in http://www.opengl.org/registry/specs/ARB/half_float_pixel.txt as 5 bits of exponent and 10 bits of mantissa. For float, it's 8 and 23, respectively. Can't one simply type-pun a float into a bitfield and copy sign, mantissa, and exponent over to another bitfield? — Damon, Mar 11 '14 at 00:04
@Damon Unfortunately with just copying over the bitfield it is *not* done. Even disregadring any special values like INFs and NaNs, you would still have to care for over- or underflows (and of course the different exponent biases). Though, I agree it's easier when not caring about strict IEEE conformance and each and every sepcial case, but copying some bits is still a bit too less. — Christian Rau, Mar 12 '14 at 13:54
As I recall IEEE754 16 bit floats do not have INFs or NaN representation, they use the whole range for normal numbers. An input 32bit may be set to INF or NaN but that is an easy test, `if a real number within range convert; else discard`. — Max Power, Jan 25 '22 at 23:02

Goz · Accepted Answer · 2018-04-13T15:12:20.310

Edit: The most obvious error appears in your VertexHalf structure. You have an element of padding. Yet when you specify your glVertexAttribPointer you specify a 0 in the stride which indicates it is tightly packed. So you can either change VertexHalf to remove the padding or change your glVertexAttribPointer to have a stride of 8 bytes.

I use the following class with DirectX for float16 support and it works perfectly.

Float16.h:

#ifndef THE__FLOAT_16_H_
#define THE__FLOAT_16_H_

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

extern short FloatToFloat16( float value );
extern float Float16ToFloat( short value );

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

class Float16
{
protected:
    short mValue;
public:
    Float16();
    Float16( float value );
    Float16( const Float16& value );

    operator float();
    operator float() const;

    friend Float16 operator + ( const Float16& val1, const Float16& val2 );
    friend Float16 operator - ( const Float16& val1, const Float16& val2 );
    friend Float16 operator * ( const Float16& val1, const Float16& val2 );
    friend Float16 operator / ( const Float16& val1, const Float16& val2 );

    Float16& operator =( const Float16& val );
    Float16& operator +=( const Float16& val );
    Float16& operator -=( const Float16& val );
    Float16& operator *=( const Float16& val );
    Float16& operator /=( const Float16& val );
    Float16& operator -();
};

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16::Float16()
{
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16::Float16( float value )
{
    mValue  = FloatToFloat16( value );
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16::Float16( const Float16 &value )
{
    mValue  = value.mValue;
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16::operator float()
{
    return Float16ToFloat( mValue );
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16::operator float() const
{
    return Float16ToFloat( mValue );
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16& Float16::operator =( const Float16& val )
{
    mValue  = val.mValue;
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16& Float16::operator +=( const Float16& val )
{
    *this   = *this + val;
    return *this;
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16& Float16::operator -=( const Float16& val )
{
    *this   = *this - val;
    return *this;

}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16& Float16::operator *=( const Float16& val )
{
    *this   = *this * val;
    return *this;
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16& Float16::operator /=( const Float16& val )
{
    *this   = *this / val;
    return *this;
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16& Float16::operator -()
{
    *this   = Float16( -(float)*this );
    return *this;
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/
/*+----+                                 Friends                                       +----+*/
/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16 operator + ( const Float16& val1, const Float16& val2 )
{
    return Float16( (float)val1 + (float)val2 );
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16 operator - ( const Float16& val1, const Float16& val2 )
{
    return Float16( (float)val1 - (float)val2 );
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16 operator * ( const Float16& val1, const Float16& val2 )
{
    return Float16( (float)val1 * (float)val2 );
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

inline Float16 operator / ( const Float16& val1, const Float16& val2 )
{
    return Float16( (float)val1 / (float)val2 );
}

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/


#endif

Float16.cpp:

#include "Types/Float16.h"

//#include <d3dx9.h>

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

short FloatToFloat16( float value )
{
    short   fltInt16;
    int     fltInt32;
    memcpy( &fltInt32, &value, sizeof( float ) );
    fltInt16    =  ((fltInt32 & 0x7fffffff) >> 13) - (0x38000000 >> 13);
    fltInt16    |= ((fltInt32 & 0x80000000) >> 16);

    return fltInt16;
}

/+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+/

float Float16ToFloat( short fltInt16 )
{
    int fltInt32    =  ((fltInt16 & 0x8000) << 16);
    fltInt32        |= ((fltInt16 & 0x7fff) << 13) + 0x38000000;

    float fRet;
    memcpy( &fRet, &fltInt32, sizeof( float ) );
    return fRet;
 }

/*+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+*/

I already have a float32-float16 conversion, courtesy of Jherico. The problem is getting one that OpenGL can read. — Haydn V. Harach, Mar 09 '14 at 17:51
@HaydnV.Harach: Why wouldn't OpenGL be able to read it? OpenGL just expects the data to be in the correct bit pattern and be 16-bits per entry. That data could be stored in shorts or packed into floats for all OpenGL cares ... — Goz, Mar 10 '14 at 09:43
@HaydnV.Harach: I've added my class implementation of the float16. I've been using it under DirectX for a while and haven't noticed any problems. — Goz, Mar 10 '14 at 09:58
I tested it out, but when I convert a number to a half and then back to a float, I get 0.0f. — Haydn V. Harach, Mar 10 '14 at 19:10
@HaydnV.Harach: Fair point, I've buggered something up there, hold on Ill look into it ... — Goz, Mar 10 '14 at 22:42
@HaydnV.Harach: OK after some more digging around it seems the code I have on mac is waaaay out of date. Try the above. — Goz, Mar 10 '14 at 23:15
Though its worth pointing out mine doesn't do any overflow protection etc ... Its very fast provided you can be sure that the input data is valid. — Goz, Mar 10 '14 at 23:17
I can convert a float to a half and back to a float safely, but it mangles my vertices just like the other implementations. — Haydn V. Harach, Mar 11 '14 at 00:02
@HaydnV.Harach: is the object definitely small enough to be represented by halfs? — Goz, Mar 11 '14 at 06:45
Yes, it's a cube made up entirely of edges, each vertex position is (+-1.0f, +-1.0f, +-1.0f). — Haydn V. Harach, Mar 11 '14 at 06:48
@HaydnV.Harach: Can I suggest you post up your model generation and rendering code? — Goz, Mar 11 '14 at 08:03
@HaydnV.Harach: Check my edit, I'm pretty sure I've spotted your error. — Goz, Mar 11 '14 at 18:48
I'll be damned, you were absolutely right. Thank you so much! PS, I like the simplicity of your implementation, but what are some concerns I should have about "valid input data"? Just make sure that it isn't NaN/INF? — Haydn V. Harach, Mar 11 '14 at 19:15
@HaydnV.Harach: Basically you need to be cautious that your floats aren't too large or too small (Maximum/Minimum is +/-65504, I believe). Above these values you will get totally invalid results as things will start to "wrap around". Below the minimum representation you'll keep getting zeros rather than anything useful. A better, though as a result slower, implementation would provide for overflow protection (round to infinite, for example). — Goz, Mar 11 '14 at 22:39

score 4 · Answer 2 · answered Mar 05 '14 at 22:41

4

The GLM library supports half-float types. The prefix used is 'h' so where glm::vec3 is a 3 element vector of floating points values, glm::hvec3 is a 3 element vector of half-floats.

answered Mar 05 '14 at 22:41

Jherico

28,584
8
61
87

I looked through glm and I can't seem to find an implementation for the `half` type. Even in half_float.inl, I can't seem to find it... – Haydn V. Harach Mar 05 '14 at 22:57
I copied glm's implementation, but I'm getting some wierd results. For instance, when I convert "0.9001f" to a half and then back to a float, I get "0.899902f". Is this something inherent to using 16-bit floats, or is the implementation flawed? – Haydn V. Harach Mar 06 '14 at 00:01
3

Yes, it's inherent to 16 bit floats. You have fewer bits of precision so you're going to have a larger potential difference between the input float and the output float. You get the same thing with normal floats, just with more digits of precision. – Jherico Mar 06 '14 at 01:43
Should I have any concerns about sending these to OpenGL? – Haydn V. Harach Mar 06 '14 at 03:16
I just tested it out, and no, it didn't work. What used to be a nice cube is now vertex soup. – Haydn V. Harach Mar 07 '14 at 00:26
OpenEXR also has a 'half' class that is OpenGL compatible. This looks like a copy of it here: http://www.sidefx.com/docs/hdk12.1/half_8h_source.html – Dithermaster Mar 07 '14 at 01:46
I need the implementation for "convert", which isn't in that file. – Haydn V. Harach Mar 07 '14 at 02:25
What is the performance like for glm::half? I can't imagine its very fast if there's no hardware support for it. Is it more performant in large arrays, where data locality comes into play? If so, how big does an array have to be to justify its use? – 16807 Oct 17 '18 at 22:44
The point isn't really necessarily to start using `glm::half` where you might otherwise use float. It's typically just a mechanism for storing data that you might then be passing on to the GPU. Having an array of `glm::half` is going to be easier to deal with in the debugger than just having a unsigned char* or void* to a bunch of bytes that make no sense. – Jherico Oct 18 '18 at 00:52

score 0 · Answer 3 · answered Mar 10 '14 at 14:33

0

I only know of OpenEXR library that has an implementation for half. The good thing is that the implementation of half has the functions you are looking for, and it even works with NVIDIA CG toolkit.

The bad thing is that i don't know if the half type is compatible out of the box with the opengl version you use (in theory it should be), so you should do some testing before you decide to use it.

answered Mar 10 '14 at 14:33

Raxvan

6,257
2
25
46

I just tested it out by copying OpenEXR's implementation _exactly_, and while I can convert a float to half and back to a float with minimal loss of data, uploading it to OpenGL mangles my vertices (same result as jherico's tip). – Haydn V. Harach Mar 10 '14 at 19:39
@Haydn V. Harach: what opengl version are you using ? Are you sure the version you are using supports `halfs` ? – Raxvan Mar 10 '14 at 20:23
OpenGL 3.3.0, GLSL 3.30 NVIDIA via Cg compiler, Geforce GTX 560 Ti. My card supports OpenGL 4.3 (I think 4.4 as well), but I specifically request a 3.3 context. – Haydn V. Harach Mar 10 '14 at 22:26
@Haydn V. Harach can you please update your question with a minimum sample on the initialization and rendering of the vertex data (shortest working example) ? OpenEXR half's should be compatible with the CG format and if it's not working i suspect the problem lies elsewhere – Raxvan Mar 11 '14 at 08:40

16-bit floats and GL_HALF_FLOAT

3 Answers3

Linked