i am using this textbook Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson), and there is a section I don't really understand very well.
C code:
void write_read(long *src, long *dst, long n)
{
long cnt = n;
long val = 0;
while (cnt) {
*dst = val;
val = (*src)+1;
cnt--;
}
}
Inner loop of write_read
:
#src in %rdi, dst in %rsi, val in %rax
.L3:
movq %rax, (%rsi) # Write val to dst
movq (%rdi), %rax # t = *src
addq $1, %rax # val = t+1
subq $1, %rdx # cnt--
jne .L3 # If != 0, goto loop
Given this code, the textbook gives this diagram to describe the program flow
This is the explanation given, for those who don't have access to the TB:
Figure 5.35 shows a data-flow representation of this loop code. The instruction
movq %rax,(%rsi)
is translated into two operations: The s_addr instruction computes the address for the store operation, creates an entry in the store buffer, and sets the address field for that entry. The s_data operation sets the data field for the entry. As we will see, the fact that these two computations are performed independently can be important to program performance. This motivates the separate functional units for these operations in the reference machine.In addition to the data dependencies between the operations caused by the writing and reading of registers, the arcs on the right of the operators denote a set of implicit dependencies for these operations. In particular, the address computation of the s_addr operation must clearly precede the s_data operation.
In addition, the load operation generated by decoding the instruction
movq (%rdi), %rax
must check the addresses of any pending store operations, creating a data dependency between it and the s_addr operation. The figure shows a dashed arc between the s_data and load operations. This dependency is conditional: if the two addresses match, the load operation must wait until the s_data has deposited its result into the store buffer, but if the two addresses differ, the two operations can proceed independently.
a) What I am not really clear about is why after this line movq %rax,(%rsi)
there needs to be a load
done after s_data
is called? I'm assuming that when s_data
is called, the value of %rax
is stored in the location that the address of %rsi
is pointing to? Does this mean that after every s_data
there needs to be a load
call?
b) It doesn't really show in the diagram but from what I understand from the explanation given in the book, movq (%rdi), %rax
this line requires its own set of s_addr
and s_data
? So is it accurate to say that all movq
calls require an s_addr
and s_data
call followed by the check to check if the addresses match before calling load
?
Quite confused over these parts, would appreciate if someone can explain how the s_addr
and s_data
calls work with load
and when it is required to have these functions, thank you!!