How to allocate cuda half-precision arrays correctly?

Question

I am confused about the correct way to define half - precision data allocation on the device. If for example i wish to have an array of half in device memory, should it be allocated as such?

__device__ half array[32];

or will this net the same actual device memory usage as an array of float because each half is stored with unused bits before it? (like how a bool has to occupy an entire memmory address, not just 1 bit) Is the correct way to allocate as following?

__device__ half2 array[16];

if both net the same amount of bytes allocated, what is the point of half2 ?

Robert Crovella · Accepted Answer · 2018-03-17T18:53:19.587

Both declarations take up the same amount of space. Either method should be usable from a code-correctness standpoint.

half2 should probably be preferred for at least 2 reasons:

When doing a warp-wide load, one half2 per thread will be a more efficient load (or store) than one half per thread.
Some operations like addition and multiplication will only achieve full (ie. highest) throughput on the architectures that support such operations (cc5.3, and cc6.1 and higher) when done on a half2 type (ignoring Volta tensorCore operations for this discussion).

In other respects, the decision about whether to use one or the other might be equivalent to asking such about any other vector type, such as int vs. int2. I'm not going to try to give a complete summary on the motivations to use vector types here.

The underlying memory storage pattern for the two examples given would be the same, so in general, a properly aligned pointer to half should be castable (say, on proper/alternating index boundaries) to half2 and likewise a half2 pointer should be safely castable to half.

Even if you choose to use half2, usage of half may be appropriate or unavoidable in certain situations, for example if you need to modify an individual quantity, or for example for certain CUBLAS function calls. In that case, at least with the most recent versions of CUDA, it is possible to convert half2 to half (and vice versa) as you would with any other vector type:

__device__ half2 array[16];
...
half2 myval = array[0];
half first = myval.x;
half second = myval.y;

and as already mentioned, a half2 pointer can be safely cast to half:

half2 *data = array+2;
half  *hdata = reinterpret_cast<half *>(data);

This somewhat related question may be of interest also.

How to allocate cuda half-precision arrays correctly?

1 Answers1