As written in Nvidia's Inline PTX Assembly document, the grammar for using inline assembly is:
asm("temp_string" : "constraint"(output) : "constraint"(input));
Here are two examples:
asm("vadd.s32.s32.s32 %0, %1.h0, %2.h0;" : "=r"(v) : "r"(a), "r"(b));
asm("vadd.u32.u32.u32 %0.b0, %1, %2, %3;" : "=r"(v) : "r"(a), "r"(b), "r"(z));
In both examples, there are parameters such as:h0
or b0
follow the %n
. I looked through CUDA's official document and didn't find anything concerns about the meaning of h0
or b0
. I've seen h0
,h1
and b0
,b1
,b2
,b3
. I guess h0
or h1
represents a 16bit value, while bn
represents a byte value. Does someone know the exact meaning of these?
Thanks for the help from Roger Dahl. I read the PTX ISA 3.0 and found the answer.
"h" means half-word. h0
means the low half-word of a 32bit word. h1
means the high half-word of a 32bit word. "b" means an integer byte. b0
,b1
,b2
and b3
represent the first 8bit, second 8bit, third 8bit and highest 8bit of a 32bit word.