I am currently working on an assembly function that sets a buffer to zero. I am measuring the clock cycles it takes to execute the function. However, I have encountered an issue where the number of clock cycles remains the same regardless of the increasing buffer size, and I'm unable to explain this behavior.
Here's the assembly function I'm using:
_set0:
set0:
movq $0, (%rdi)
movq $0, 8(%rdi)
movq $0, 16(%rdi)
movq $0, 24(%rdi)
movq $0, 32(%rdi)
movq $0, 40(%rdi)
ret
I expected that as I increase the number of movq
instructions, representing the buffer size, the number of clock cycles required to execute the function would increase proportionally. However, when I modify the function as follows:
_set0:
set0:
movq $0, (%rdi)
movq $0, 8(%rdi)
movq $0, 16(%rdi)
movq $0, 24(%rdi)
movq $0, 32(%rdi)
movq $0, 40(%rdi)
movq $0, 48(%rdi)
movq $0, 56(%rdi)
movq $0, 64(%rdi)
movq $0, 72(%rdi)
movq $0, 80(%rdi)
movq $0, 88(%rdi)
ret
The number of clock cycles measured remains the same, despite the increased buffer size.
I would appreciate any insights or suggestions as to why the clock cycles measurement is not increasing as expected with the buffer size.
To measure clock cycles, I'm calling this function from a C file and I have this:
static inline uint64_t cpucycles(void) {
uint64_t result;
__asm__ volatile("rdtsc; shlq $32,%%rdx; orq %%rdx,%%rax" : "=a"(result) : : "%rdx");
return result;
}
and then I take the median like this:
static uint64_t cpucycles_median(uint64_t *cycles, size_t timings) {
for (size_t i = 0; i < timings - 1; i++) {
cycles[i] = cycles[i + 1] - cycles[i];
}
return median(cycles, timings - 1);
}
For computing the number of cycles it takes to run the function, I'm running the function 1000 times and taking the median of the cycles it took to run each time.