I have the following function with inline assembly that works fine on debug mode in 32 bit Visual Studio 2008:
__device__ void add(int* pa, int* pb)
{
asm(".reg .u32 s<3>;"::);
asm(".reg .u32 r<14>;"::);
asm("ld.global.b32 s0, [%0];"::"r"(&pa)); //load addresses of pa, pb
printf(...);
asm("ld.global.b32 s1, [%0];"::"r"(&pb));
printf(...);
asm("ld.global.b32 r1, [s0+8];"::);
printf(...);
asm("ld.global.b32 r2, [s1+8];"::);
printf(...);
...// perform some operations
}
pa and pb are globally allocated on the device such as
__device__ int pa[3] = {0, 0x927c0000, 0x20000011};
__device__ int pb[3] = {0, 0xbb900000, 0x2000000b};
However, this code fails on release mode, on line asm("ld.global.b32 r1, [s0+8];"::);
How can I load function parameters correctly with inline ptx on release mode?
P.S. building the release mode with -G flag (Generates GPU debug info) causes the code to run correctly on release mode. Thank you,