I wrote a simple program to test page faults and tlb miss with perf. The code is as follow. It writes 1 GB data sequentially and is expected to trigger 1GB/4KB=256K tlb misses and page faults.
#include<stdio.h>
#include <stdlib.h>
#define STEP 64
#define LENGTH (1024*1024*1024)
int main(){
char* a = malloc(LENGTH);
int i;
for(i=0; i<LENGTH; i+=STEP){
a[i] = 'a';
}
return 0;
}
However, the result is as follow and far smaller than expected. Is perf so imprecise? I would be very appreciated if anyone can run the code on his machine.
$ perf stat -e dTLB-load-misses,page-faults ./a.out
Performance counter stats for './a.out':
12299 dTLB-load-misses
1070 page-faults
0.427970453 seconds time elapsed
Environment: Ubuntu 14.04.5 LTS , kernel 4.4.0; gcc 4.8.4 glibc 2.19. No compile flags.
The CPU is Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz.