When we write inline PTX assembly in our generally C/C++ CUDA code, e.g.:
__device__ __inline__ uint32_t bfind(uint32_t val)
{
uint32_t ret;
asm ("bfind.u32 %0, %1;" : "=r"(ret): "r"(val));
return ret;
}
we can add the volatile
keyword after asm
, e.g.:
__device__ __inline__ uint32_t bfind(uint32_t val)
{
uint32_t ret;
asm volatile ("bfind.u32 %0, %1;" : "=r"(ret): "r"(val));
return ret;
}
The CUDA documentation on inline PTX assembly says:
The compiler assumes that an
asm()
statement has no side effects except to change the output operands. To ensure that theasm
is not deleted or moved during generation of PTX, you should use the volatile keyword
I don't understand what that means. So,
- Why would my
asm()
be deleted? Or rather, if the compiler notices it has no effect, why should I mind it being deleted? - Why is it a problem if my
asm()
is moved during the generation of PTX? That's part of the optimization process, isn't it? - How would one characterize the compiler's behavior more exactly when facing non-volatile and volatile
asm()
instructions respectively?