In this paper, it is written that the 8 bytes sequential write of clwb
and ntstore
of optane PM have 90ns and 62ns latency, respectively, and sequential reading is 169ns.
But in my test with Intel 5218R CPU, clwb
is about 700ns and ntstore
is about 1200ns. Of course, there is a difference between my test method and the paper, but the result is too bad, which is unreasonable. And my test is closer to actual usage.
During the test, did the Write Pending Queue of CPU's iMC or the WC buffer in the optane PM become the bottleneck, causing blockage, and the measured latency has been inaccurate? If this is the case, is there a tool to detect it?
#include "libpmem.h"
#include "stdio.h"
#include "x86intrin.h"
//gcc aep_test.c -o aep_test -O3 -mclwb -lpmem
int main()
{
size_t mapped_len;
char str[32];
int is_pmem;
sprintf(str, "/mnt/pmem/pmmap_file_1");
int64_t *p = pmem_map_file(str, 4096 * 1024 * 128, PMEM_FILE_CREATE, 0666, &mapped_len, &is_pmem);
if (p == NULL)
{
printf("map file fail!");
exit(1);
}
if (!is_pmem)
{
printf("map file fail!");
exit(1);
}
struct timeval start;
struct timeval end;
unsigned long diff;
int loop_num = 10000;
_mm_mfence();
gettimeofday(&start, NULL);
for (int i = 0; i < loop_num; i++)
{
p[i] = 0x2222;
_mm_clwb(p + i);
// _mm_stream_si64(p + i, 0x2222);
_mm_sfence();
}
gettimeofday(&end, NULL);
diff = 1000000 * (end.tv_sec - start.tv_sec) + end.tv_usec - start.tv_usec;
printf("Total time is %ld us\n", diff);
printf("Latency is %ld ns\n", diff * 1000 / loop_num);
return 0;
}
Any help or correction is much appreciated!