How to generate x86 CPU to main memory traffic in C to fill and invalidate TLB buffer on linux machine

Question

I want to generate x86 CPU to Memory traffic on Linux OS (Ubuntu 18) using gcc tool chain that should first fill up tlb (translation look aside buffer) and then cause tlb invalidates as it has already filled up. I have created simple code below but I am not sure if it can achieve the goal of filling up tlb and then invalidating it

#include<stdio.h>

int main()
{

int array[1000];

int i;
long sum = 0;

for(i=0; i < 1000;i++)
{
    array[i] = i;   
}


for(i=0; i < 1000;i++)
{
    sum += array[i] 
}

return 0;

}

Here is the processor specific info in case it is useful

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 1
model name      : AMD EPYC 7281 16-Core Processor
stepping        : 2
microcode       : 0x8001227
cpu MHz         : 2694.732
cache size      : 512 KB
physical id     : 0
siblings        : 32
core id         : 0
cpu cores       : 16
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes

The crux is to have answers to the following

what triggers tlb invalidates to happen?
How can one test tlb invalidates did happen?

Without knowing the precise characteristics of the CPU this is running on and how the compiler will generate machine instructions this is not possible. Also if this is just C code, please, **please** do not tag as C++ for no reason. — tadman, May 30 '18 at 17:22
@EugeneSh. While it is indeed mostly architecture specific, I can't help but wonder if there might be a way to pull this off reliably by using `volatile`. — , May 30 '18 at 17:22
@Frank How would `volatile` help on a system without any cache or tlb? — Eugene Sh., May 30 '18 at 17:23
@Frank `volatile` is just a way of avoiding compiler optimizations that assume the data can't change unless written to by the code. — tadman, May 30 '18 at 17:25
"x86 CPU" is not specific enough, that may as well say "some kind of computer". For this kind of thing you'll need a specific Intel or AMD part number. Xeon, i7, i5, i3, Pentium, Epyc, Ryzen, they're all substantially different, even between "generations". A Xeon 2670 and 2670v3 are two radically different chips despite the same base part number. — tadman, May 30 '18 at 17:41
@tadman And apparently it would be pretty hard to get the processor-specific answer. Too narrow.. — Eugene Sh., May 30 '18 at 17:42
@EugeneSh. I'm not disagreeing with that, either. This would be really difficult to do reliably on a single target CPU, but at least it would be theoretically possible with enough experimentation. — tadman, May 30 '18 at 17:43
There's no easy answer here. The first question you should be asking is "How do I know if the TLB has been invalidated?" and to get that answer you need to read the processor documentation. That is a nice CPU though. — tadman, May 30 '18 at 17:45
@tadman Found this question that might be useful https://stackoverflow.com/questions/20183273/how-to-test-main-memory-access-time?noredirect=1&lq=1 — tulamba, May 30 '18 at 17:53
Are you trying to measure TLB evictions from needing to load new entries? i.e. capacity misses and conflict misses. In that case touch a single line per 4k page (preferably chosen to avoid L1d conflict misses). Use `volatile` loads. Disable anonymous hugepages, or account for it in your microbenchmark. Or are you trying to measure cases where the kernel used `invlpg` to actively invalidate a TLB entry? — Peter Cordes, May 31 '18 at 00:14
Basically you should read up on how Zen's TLB works (i.e. fully associative or not), and design your microbenchmark to confirm that. It's much easier to write a microbenchmark when you already know the result you're expecting, so you should aim for this most of the time :P If not, it's easy to actually measure something without realizing it. — Peter Cordes, May 31 '18 at 00:17
The code accesses only a 4KB array, which can be contained within a single page or at most two pages. So the code only fills one or two TLB entries. — Hadi Brais, May 31 '18 at 04:11
@HadiBrais Could you change the code so that it fills up all tlb entries? — tulamba, May 31 '18 at 15:30
@HadiBrais Please post as solution. Even pseducode is fine. I will do the effort. I am looking for clear guidelines — tulamba, May 31 '18 at 18:26

How to generate x86 CPU to main memory traffic in C to fill and invalidate TLB buffer on linux machine

0 Answers0