the detail of test program
I tested it in user mode using examples from Intel SDM, The details of this test are as follows:
- Provide an global lock using SystemV shared memory for multiple processes use it
- Use the assembly example in Intel SDM as the main code for lock grabbing:
int get_lock()
{
int locked_val = 1;
__asm__(
"Spin_Lock: \n"
"cmpq $0, (%[global_lock_addr]) \n"
"je Get_Lock \n"
FILL_INST "\n" //FILL_INST is "nop" or "pause"
"jmp Spin_Lock \n"
"Get_Lock: \n"
"mov $1, %[locked_val] \n"
"xchg %[locked_val], (%[global_lock_addr]) \n"
"cmp $0, %[locked_val]\n"
"jne Spin_Lock \n"
"Get_Lock_Success:"
::
[global_lock_addr] "r" (global_lock),
[locked_val] "r" (locked_val)
:
);
return 0;
}
- The implementation of release lock is as follows:
int release_lock()
{
int unlock_val = 0;
__asm("xchg %[unlock_val], (%[global_lock_addr])"::
[global_lock_addr] "r" (global_lock),
[unlock_val] "r" (unlock_val)
:);
}
- The process will exit after obtaining and releasing the lock a certain number of times
int main()
{
...
printf("exec lock \n");
for (i = 0; i < LOOP_TIME; i++) {
get_lock();
release_lock();
}
...
}
- Two executable programs can be compiled using this Makefile
spinlock_pause
:FILL_INST
macro is defined as "pause" stringspinlock_nopause
:FILL_INST
macro is defined as "nop" string
- By using the
exec.sh
script, compilation and running can be completed, provided that theCUR_EXEC_NUM
environment variable needs to be defined
This variable indicate how many processes will be started
- Using the perf command to collect
machine_clears.memory_ordering
andinst_retired.any
events for executing a program - Save the test results in the ./log directory
test result
Test environment
- x86 E5-2666 v3
- core number: 20 , Thread per core : 2
- export CUR_EXEC_NUM 40 (one program work on per thread)
result
- spinlock_pause
get lock time 5000000
Performance counter stats for '././spinlock_pause':
5,228,707 machine_clears.memory_ordering:u
5,018,001,922 inst_retired.any:u
78.053822086 seconds time elapsed
76.887470000 seconds user
0.022657000 seconds sys
- spinlock_nopause
get lock time 5000000
Performance counter stats for '././spinlock_nopause':
74,524,989 machine_clears.memory_ordering:u
21,212,346,839 inst_retired.any:u
73.076739387 seconds time elapsed
72.129267000 seconds user
0.010899000 seconds sys
From the above results, it can be seen that the pause instruction
can reduce machine_clears.memory_ordering
event count.
(I don't know if this event is equal to memory order violation
),
but due to the cycle of pause
instruction is larger to decrease in the
number of instructions executed (inst-reired), I don't think
this test can indicate that pause can avoid machines_clears.memory_ordering
On the other hand, the total execution time of programs using
the pause
instruction is NOT LESS than that of programs using
the nop
instruction.
Based on the above test results, I would like to consult everyone if there are any defects in my testing or how to explain the above test results.