Copying to and from half-precision F16 allocation in android (renderscript)

Question

Also asked here with no luck (https://groups.google.com/forum/#!topic/android-developers/Rh_L9Jv_S8Q)

I'm trying to figure out how to do half-precision using types like half and half4. The only problem seems to be getting the numbers from java to renderscript and back.

The Java Code:

private float[] input;
private float[] half_output;
private RenderScript mRS;
private ScriptC_mono mScript;
private final int dimen = 15;
...

//onCreate
input = new float[dimen * dimen * 3];      //later loaded from file 182.24 3.98 105.83 226.08 15.2 80.01...
half_output = new float[dimen * dimen * 3];
...

//function calling renderscript
mRS = RenderScript.create(this);
ScriptC_halfPrecision mScript = new ScriptC_halfPrecision(mRS);

Allocation input2 = Allocation.createSized(mRS, Element.F16(mRS), dimen * dimen * 3);
input2.copyFromUnchecked(input);            //copy float values to F16 allocation

Allocation halfIndex = Allocation.createSized(mRS, Element.F16(mRS), dimen * dimen);
Type.Builder half_output_type = new Type.Builder(mRS, Element.F16(mRS)).setX(dimen * dimen * 3);
Allocation output3 = Allocation.createTyped(mRS, half_output_type.create());

mScript.set_half_in(input2);
mScript.set_half_out(output3);
mScript.forEach_half_operation(halfIndex);

output3.copy1DRangeToUnchecked(0, dimen * dimen * 3, half_output);  //copy F16 allocation back to float array

The Renderscript:

#pragma version(1)
#pragma rs java_package_name(com.example.android.rs.hellocompute)

rs_allocation half_in;
rs_allocation half_out;

half __attribute__((kernel)) half_operation(uint32_t x) {
    half4 out = rsGetElementAt_half4(half_in, x);

    out.x /= 2.0;
    out.y /= 2.0;
    out.z /= 2.0;
    out.w /= 2.0;

    rsSetElementAt_half4(half_out, out, x);
}

I also tried this instead of the last line shown in the Java code:

float temp_half[] = new float[1];
for (int i = 0; i < dimen * dimen * 3; ++i) {     //copy F16 allocation back to float array
    output3.copy1DRangeToUnchecked(i, 1, temp_half);
    half_output[i]=temp_half[0];
}

All the above code works perfectly for float4 variables in the renderscript and F32 allocations in the java. This is obviously because there is no issue going from renderscript float to java float. But trying to go from java float (since there is no java half) to renderscript half and back again is very difficult. Can anyone tell me how to do it?

Both of the above versions of the java code result in seemingly random values in the half_output array. They are obviously not random because they are the same values every time I run it, no matter what the operation in the half_operation(uint32_t x) function. I've tried changing the out.x /= 2.0; (and corresponding y,z,w code) to out.x /= 2000000.0; or out.x *= 2000000.0; and still the values that end up in the half_output array are the same every time I run it.

Using input of 182.24 3.98 105.83 226.08 15.2 80.01...

Using this java

output3.copy1DRangeToUnchecked(0, dimen * dimen * 3, half_output);  //copy F16 allocation back to float array

The resulting half_output is 46657.44 27094.48 3891.45 965.1825 36223.44 14959.08...

Using this java

float temp_half[] = new float[1];
for (int i = 0; i < dimen * dimen * 3; ++i) {     //copy F16 allocation back to float array
    output3.copy1DRangeToUnchecked(i, 1, temp_half);
    half_output[i]=temp_half[0];
}

The resulting half_output is 2.3476E-41 2.5546E-41 6.2047E-41 2.5407E-41 1.9802E-41 2.4914E-41...

Again these are the results no matter what I change the out.x /= 2.0; algorithm to.

score 0 · Answer 1 · edited May 23 '17 at 11:53

The problem is this copy does not do a conversion. It will just put your source FP32 values into memory, but then when you try and interpret those values as FP16, they will be incorrect.

input2.copyFromUnchecked(input);            //copy float values to F16 allocation

You might port something like the answer from this question to renderscript:

32-bit to 16-bit Floating Point Conversion

If your input doesn't have denorms/infinity/nan/overflow/underflow this seems like an ok solution:

uint32_t x = *((uint32_t*)&f);
uint16_t h = ((x>>16)&0x8000)|((((x&0x7f800000)-0x38000000)>>13)&0x7c00)|((x>>13)&0x03ff);

Really the solution is to have your source values in the file in fp16 binary format already. Read them into a java byte[] array and then do the copy into the fp16 input allocation. Then when the renderscript kernel interprets them as fp16 then you should have no problem.

Copying to and from half-precision F16 allocation in android (renderscript)

1 Answers1