2

I wrote a simple program to test page faults and tlb miss with perf. The code is as follow. It writes 1 GB data sequentially and is expected to trigger 1GB/4KB=256K tlb misses and page faults.

#include<stdio.h>
#include <stdlib.h>

#define STEP 64
#define LENGTH (1024*1024*1024)
int main(){
    char* a = malloc(LENGTH);
    int i;
    for(i=0; i<LENGTH; i+=STEP){
            a[i] = 'a';
    }

    return 0;
}

However, the result is as follow and far smaller than expected. Is perf so imprecise? I would be very appreciated if anyone can run the code on his machine.

 $ perf stat -e dTLB-load-misses,page-faults ./a.out

   Performance counter stats for './a.out':

         12299      dTLB-load-misses
          1070      page-faults

   0.427970453 seconds time elapsed

Environment: Ubuntu 14.04.5 LTS , kernel 4.4.0; gcc 4.8.4 glibc 2.19. No compile flags.

The CPU is Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz.

Waker Leo
  • 129
  • 1
  • 5
  • Do you have transparent huge pages enabled? This might be a reason for lower TLB-load-misses. – andrjas Nov 29 '17 at 08:04
  • No. Actually, I wanted to use huge page but fount it does not help, so I did the profile and found that 4KB page already worked well. – Waker Leo Nov 29 '17 at 08:13
  • it's better to copy text output from your program and paste here instead of capturing a screenshot. And you [don't need to cast the result of malloc in C](https://stackoverflow.com/q/605845/995714) – phuclv Nov 29 '17 at 08:28
  • Thanks for your advice. Updated. – Waker Leo Nov 29 '17 at 08:41
  • I can't reproduce the low number of page faults. Please provide more system details (versions of distribution, kernel, glibc, compiler (flags)). – Zulan Nov 29 '17 at 23:15

1 Answers1

0

The kernel prefetches pages on a fault, at least after it has evidence of a pattern. Can't find a definitive reference on the algorithm, but perhaps https://github.com/torvalds/linux/blob/master/mm/readahead.c is a starting point to seeing what is going on. I'd look for other performance counters that capture the behavior of this mechanism.

Zalman Stern
  • 3,161
  • 12
  • 18