2

In Control and Status Registers section of riscv-asm-manual, there is an example:

.equ RTC_BASE,      0x40000000
.equ TIMER_BASE,    0x40004000

# setup machine trap vector
1:      auipc   t0, %pcrel_hi(mtvec)        # load mtvec(hi)
        addi    t0, t0, %pcrel_lo(1b)       # load mtvec(lo)
        csrrw   zero, mtvec, t0

...
# break on interrupt
mtvec:
        csrrc  t0, mcause, zero
        bgez t0, fail       # interrupt causes are less than zero
        slli t0, t0, 1      # shift off high bit
...

I guess %pcrel_hi(mtvec) calculate the hi-distance between mtvec and current PC (here is the address of 1 symbol). Suppose the address of 1 symbol is 0x80010000, and that of mtvec is 0x80020040. Then %pcrel_hi(mtvec) = (0x80020040 - 0x80010000) >> 12 = 0x00010, so the result of auipc is 0x00010 << 12 + PC = 0x00010000 + 0x80010000 = 0x80020000.

But %pcrel_lo takes 1b as its argument. How to calculate its result and get the final address of mtvec? addi t0, t0, %pcrel_lo(mtvect) seems to be the intuitive code, but actually not. Why?

Wanghz
  • 305
  • 2
  • 12
  • https://godbolt.org/z/EebPsc confirms that normal compiler output for a normal C function returning the address of a global variable does the same thing, using `addi` with `%pcrel_lo(.LBB0_1)` address of the auipc, not of the target symbol. So this isn't some special use-case doing something different, it's apparently how RISC-V always works for position-independent code. – Peter Cordes Jan 25 '21 at 05:29
  • Yeah, it seems like a convention, but I just wonder how `%pcrel_lo(label)` will be calculated. – Wanghz Jan 26 '21 at 01:59
  • I haven't grokked it either, I'm waiting for someone to answer this question. I only commented to show my results after checking if this was normal, or somehow specific to this example about interrupt handlers. – Peter Cordes Jan 26 '21 at 03:28

1 Answers1

2

As indicated by peter Cordes in his commen,in your link and also in this one. For the addi, a label is used and not the symbol directly because the addi must contain the 12 low significant bits of relative address between pc and symbol. however the pc must be the same as the one used for the pcrel_high and because there is no guarantee from the binutils point of view that these two instructions will follow each other (which would have made it possible to calculate differently). So the solution that was chosen, to provide the right pc, was to use a label.

Now for the 1b in your exemple, it is not a result of a calculation. All the calculations are done at the linker level, the assembler just takes care of generating the necessary relocation information. The 1b means the backward label 1. Numeric labels are used for local references. References to local labels are suffixed with 'f' for a forward reference or 'b' for a backwards reference ( it is present in your link also).

If you assemble this assembly file, you will see that the label will be changed to something like .L1XXX depending on your assembler version and then if you do an riscv-objdump -r you will see that you have a R_RISCV_PCREL_LO12_I to this label on the offset corresponding to the addi.

Basically you will have something like :

OFFSET           TYPE              VALUE 
0000000000000000 R_RISCV_PCREL_HI20  mtvec
0000000000000000 R_RISCV_RELAX     *ABS*
0000000000000004 R_RISCV_PCREL_LO12_I  .L1^B

In this example the offset 0 is the offset of .L1^B1 ( the label 1 which was transformed). So the linker will use this relocation to calculate the value that will be used for the auipc.

Then for the offset 4 which is the offset of the addi instruction, The linker will find the R_RISCV_PCREL_LO12_I relocation. it will use The value .L1^B1 to get the pc and the symbol from the relocation (R_RISCV_PCREL_HI20) which corresponds to the offset of this value. then it will take the LSB 12 bits of the relative address between the pc of .L1^B1 and the address of the found symbol mtvec.

yflelion
  • 1,698
  • 2
  • 5
  • 16
  • 1
    Sure the label provides a way to get the right PC, but I haven't figured out how to get the 12 low significant bits of the difference between PC and symbol. – Wanghz Jan 26 '21 at 01:57
  • 1
    Ok, finally that answers the question. So `%pcrel_lo(label)` references the `pcrel_hi` relocation in the instruction at `label`, and *that's* why it's not something like `addi t0, t0, %pcrel_lo(mtvec - 1b)`. I wondered if that was the case, because the `addi` or `lw`/`sw` needs to know something about the low bits of the target address, not just the auipc, but until now your answer said nothing about that. – Peter Cordes Jan 26 '21 at 19:29
  • The lsb 12 bits are calculated between the address of the symbol (which is 32 bits) and the pc which is 32 bits also ( it is the offset of the label). The linker can get the full 32 bits address of mtvec by using the r-info part of the relocation and the symbol table. – yflelion Jan 26 '21 at 21:19
  • 1
    I get that, but it was completely non-obvious that `%pcrel_lo(1b)` indirectly references the `%pcrel_hi(mtvec)` at `1`. That's the part that wasn't explained anywhere (until your last edit), and is totally different from how `%hi(symbol)` and `%lo(symbol)` works. – Peter Cordes Jan 28 '21 at 20:56
  • indeed my first answer did not clarify this point well. – yflelion Jan 28 '21 at 22:03
  • @Wanghz does that answer your question or do you need more help? – yflelion Jan 28 '21 at 22:25
  • @yflelion the label makes sure that `pcrel_hi` and `pcrel_lo` use the same pc for calculation, also `pcrel_lo` can find the symbol `mtvec` through the relocation `R_RISCV_PCREL_HI20`. Yeah, I think that solves my problem. Thank you for your answer. – Wanghz Jan 29 '21 at 02:33
  • @PeterCordes Also thank you for your help. – Wanghz Jan 29 '21 at 02:34