I would like to ask a question about how to write inline assembly code for Store-Conditional instruction in RISC-V. Below is some brief background (RISCV-ISA-Specification on page 40, section 7.2):
SC writes a word in rs2 to the address in rs1, provided a valid reservation still exists on that address. SC writes zero to rd on success or a nonzero code on failure.
The instruction that we will be focusing on is SC.D
- store-conditional a 64-bit value. As shown on page 106 of RISCV-ISA-Specification, the instruction format is as follows:
00011 | aq<1> | rl<1> | rs2<5> | rs1<5> | 011 | rd<5> | 0101111
In order to use inline assembly to generate the corresponding code for SC.W instruction, we need 3 registers. The register list can be found here.
The register field of the instruction is 5 bit each. Hence, there are 32 general registers in RISC-V: x0, x1, ... x31. Each register has its own ABI(application binary interface), for instance, register x16
corresponds to a6
register, hence, the corresponding 5-bit value should be 10000
.
I choose the following registers assignment:
- rs2: a6 register (register x16, i.e. 0b10000)
- rs1: a7 register (register x17, i.e. 0b10001)
- rd: s4 register (register x20, i.e. 0b10100)
Hence, by filling in the corresponding register bits of the original instruction, we have the following:
00011 | aq<1> | rl<1> | 10000 | 10001 | 011 | 10100 | 0101111
For the two bits aq
and rl
, it is used for specifying the ordering constraints (page 40 of RISCV-ISA-Specification):
If both the aq and rl bits are set, the atomic memory operation is sequentially consistent and cannot be observed to happen before any earlier memory operations or after any later memory operations in the same RISC-V hart, and can only be observed by any other hart in the same global order of all sequentially consistent atomic memory operations to the same address domain.
So we just set both bits to 1 since we want SC.D
to be executed atomically. Now we have the final instruction bits:
00011 | 1 | 1 | 10000 | 10001 | 011 | 10100 | 0101111
-> 00011111|00001000|10111010|00101111
0x1f 0x08 0xba 0x2f
Since RISC-V uses little endian, the corresponding inline assembly can be generated by:
__asm__ volatile(".byte 0x2f, 0xba, 0x08, 0x1f");
There are also some other preparations like loading values into rs1(a7) and rs2(a6) registers. Therefore, I have the following code (but it did not work as expected):
/**
* rs2: holds the value to be written. I pick a6 register.
* rs1: holds the address to be written to. I pick a7 register.
* rd: holds the return value of SC.D instruction. I pick s4 register.
*
* @src: the value to be written. rs2. a6 register
* @dst: the address to be written to. rs1. a7 register
* @rd: the value that holds the return value of SC.D
*/
static inline void sc(void *src, void *dst, uint64_t *rd) {
uint64_t *tmp_src = (uint64_t *)src;
uint64_t src_val = *tmp_src; // 13
uint64_t dst_addr = (uint64_t)dst;
uint64_t ret = 100;
// first of all, need to prepare the registers a6 and a7.
/* load value to be written into register a6 */
__asm__ volatile("ld a6, %0"::"m"(src_val));
/* load the address to be written to into register a7 */
__asm__ volatile("ld a7, %0"::"m"(dst_addr));
/* the actual SC.D: */
__asm__ volatile(".byte 0x2f, 0xba, 0x08, 0x1f");
// __asm__ volatile("sc.d s4, a6, (a7)"); // this does not work either.
/* obtain the value in register s4 */
__asm__ volatile("sd s4, %0":"=m"(ret));
*rd = ret;
return;
}
int main() {
uint64_t *src = malloc(sizeof(uint64_t));
uint64_t *dst = malloc(sizeof(uint64_t));
uint64_t rd = 20;
*src = 13;
*dst = 3;
sc(src, dst, &rd); // write value 13 into @dst, so @dst should be 13 afterwards
// the expected output should be "dst: 13, rd: 0"
// What I get: "dst: 3, rd: 1"
printf("dst: %ld, rd: %ld\n", *src, *dst, rd);
return 0;
}
The result does not seem to change the dst
value. May I know which part I am doing wrong? Any hints would be appreciated.