I am writing a code to approximate a function using power series and would like to exploit #pragma unroll and FMA instruction, like this:
__constant__ double coeff[5] = {1.0,2.0,3.0,4.0,5.0}; /* constant is fake here */
__device__ double some_function(double x) {
double y;
int i;
y = coeff[0];
#pragma unroll
for(i=1;i<5;i++) y = y*x + coeff[i];
return y;
}
The code will be compiled into assembly like this:
ld.const.f64 %fd33, [coeff];
ld.const.f64 %fd34, [coeff+8];
fma.rn.f64 %fd35, %fd33, %fd32, %fd34;
ld.const.f64 %fd36, [coeff+16];
fma.rn.f64 %fd37, %fd35, %fd32, %fd36;
ld.const.f64 %fd38, [coeff+24];
fma.rn.f64 %fd39, %fd37, %fd32, %fd38;
ld.const.f64 %fd40, [coeff+32];
fma.rn.f64 %fd41, %fd39, %fd32, %fd40;
I want to avoid the use of constant memory and use immediate value like this:
mov.f64 %fd248, 0d3ED0EE258B7A8B04;
mov.f64 %fd249, 0d3EB1380B3AE80F1E;
fma.rn.f64 %fd250, %fd249, %fd247, %fd248;
mov.f64 %fd251, 0d3EF3B2669F02676F;
fma.rn.f64 %fd252, %fd250, %fd247, %fd251;
mov.f64 %fd253, 0d3F1745CBA9AB0956;
fma.rn.f64 %fd254, %fd252, %fd247, %fd253;
mov.f64 %fd255, 0d3F3C71C72D1B5154;
fma.rn.f64 %fd256, %fd254, %fd247, %fd255;
mov.f64 %fd257, 0d3F624924923BE72D;
fma.rn.f64 %fd258, %fd256, %fd247, %fd257;
mov.f64 %fd259, 0d3F8999999999A3C4;
fma.rn.f64 %fd260, %fd258, %fd247, %fd259;
mov.f64 %fd261, 0d3FB5555555555554;
fma.rn.f64 %fd262, %fd260, %fd247, %fd261;
I know that I can use #define
macro to do so, but it is very inconvenient when there are many coefficients.
Are there any C data type modifier (or compiler options) that could convert my coefficients array into immediate values instead of using constant memory?
I tried and it does not work for static double
, static __constant__ double
and static __device__ double
.
and my final question is: I guess using immediate value should be faster than using constant memory?