fp16 support in cuda thrust

Question

I am not able to found anything about the fp16 support in thrust cuda template library. Even the roadmap page has nothing about it: https://github.com/thrust/thrust/wiki/Roadmap

But I assume somebody has probably figured out how to overcome this problem, since the fp16 support in cuda is around for more than 6 month.

As of today, I heavily rely on thrust in my code, and templated nearly every class I use in order to ease fp16 integration, unfortunately, absolutely nothing works out of the box for half type even this simple sample code:

//STL
#include <iostream>
#include <cstdlib>

//Cuda
#include <cuda_runtime_api.h>
#include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <cuda_fp16.h>
#define T half //work when float is used

int main(int argc, char* argv[])
{
        thrust::device_vector<T> a(10,1.0f);
        float t = thrust::reduce( a.cbegin(),a.cend(),(float)0);
        std::cout<<"test = "<<t<<std::endl;
        return EXIT_SUCCESS;
}

This code cannot compile because it seems that there is no implicit conversion from float to half or half to float. However, it seems that there are intrinsics in cuda that allow for an explicit conversion.

Why can't I simply overload the half and float constructor in some header file in cuda, to add the previous intrinsic like that :

float::float( half a )
{
  return  __half2float( a ) ;
}

half::half( float a )
{
  return  __float2half( a ) ;
}

My question may seem basic but I don't understand why I haven't found much documentation about it.

Thank you in advance

The second (really separate) question is simple - intrinsic (ie. non class) types do not have constructors. You can't specialise a constructor for `float`because float, by definition, doesn't have a constructor. See http://stackoverflow.com/questions/5113365/do-built-in-types-have-default-constructors — talonmies, Mar 24 '16 at 10:18
Ok so basically, developper is not allowed to define "implicit" conversion between intrinsic types. But isn't it an other way to give a hint to the compiler to do so other than building its own cuda compiler ? — Tobbey, Mar 24 '16 at 10:32
Sorry, but I didn't understand anything of that second sentence. — talonmies, Mar 24 '16 at 11:22
Sorry for my bad english, I was asking if there was another solution apart from rewriting my own cuda compiler. Your next answer is more or less what I was expecting. — Tobbey, Mar 24 '16 at 12:13
This is something I'm looking into (I maintain Thrust at NVIDIA) — blelbach, Jan 27 '18 at 14:13
Cool I am looking forward to it, especially since I discovered It was so simple to manipulate fp16 in tensorflow. — Tobbey, Jan 27 '18 at 16:12

score 3 · Accepted Answer · edited May 23 '17 at 11:45

3

The very short answer is that what you are looking for doesn't exist.

The slightly longer answer is that thrust is intended to work on fundamental and POD types only, and the CUDA fp16 half is not a POD type. It might be possible to make two custom classes (one for the host and one for the device) which implements all the required object semantics and arithmetic operators to work correctly with thrust, but it would not be an insignificant effort to do it (and it would require writing or adapting an existing FP16 host library).

Note also that the current FP16 support is only in device code and only on compute 5.3 and newer devices. So unless you have a Tegra TX1, you can't use the FP16 library in device code anyway.

edited May 23 '17 at 11:45

Community

1
1

answered Mar 24 '16 at 11:37

talonmies

70,661
34
192
269

Actually I can use it in device code for load/store like operations, that's all. Apparently I was very dumb to simply typename template my application and hope for automatic performance improvement on cuda fp16 support addition. – Tobbey Mar 24 '16 at 12:20
1

@Tobbey: I wouldn't say it is dumb, just premature. Support for FP16 might well appear in the future, but it would require more elaborate semantic support than what is available today. Given that there will be full half precision hardware support in the next generation hardware, I would guess that it will eventually come. Just not yet. – talonmies Mar 24 '16 at 12:33
Ok, thank you for putting things in perspective. I'll give you also a +1 for the link that explain the trivial and standard layout in c++11, very interesting ! – Tobbey Mar 24 '16 at 12:41

fp16 support in cuda thrust

1 Answers1

Linked