I'm running a benchmark on xeon server , and i repeat the executions 2-3 times. I'd like to erase the cache contents in L1 and L2 while repeating the runs. Can you suggest any methods for doing so ?
Asked
Active
Viewed 3.3k times
8
-
4Which architecture and OS are you testing? – Mircea Vutcovici Aug 09 '10 at 18:54
-
I suppose doing some random other things on the server for a miunute or so would be a bit to crude? – Dentrasi Aug 09 '10 at 21:49
-
My question is why would you want to do that? – Natalie Adams Aug 10 '10 at 03:38
-
I'm running a benchmark more than once to collect data on memory and cache behaviour.. I do not want caching to affect my result – Sharat Chandra Aug 16 '10 at 20:38
-
I'm running Linux OS, Intel x86_64 architecture – Sharat Chandra Aug 16 '10 at 20:39
-
The duplicate question is no longer in existence. – jogojapan Aug 29 '12 at 01:09
1 Answers
9
Try to read repetitly large data via CPU (i.e. not by DMA). Like:
int main() {
const int size = 20*1024*1024; // Allocate 20M. Set much larger then L2
char *c = (char *)malloc(size);
for (int i = 0; i < 0xffff; i++)
for (int j = 0; j < size; j++)
c[j] = i*j;
}
However depend on server a bigger problem may be a disk cache (in memory) then L1/L2 cache. On Linux (for example) drop using:
sync
echo 3 > /proc/sys/vm/drop_caches
Edit: It is trivial to generate large program which do nothing:
#!/usr/bin/ruby
puts "main:"
200000.times { puts " nop" }
puts " xor rax, rax"
puts " ret"
Running a few times under different names (code produced not the script) should do the work

Maciej Piechotka
- 7,028
- 6
- 39
- 61
-
6Most modern CPUs have separate Instruction and Data caches; while cycling through 20M of RAM might clean the data cache; it wont touch the instruction cache. Additionally there's no guarantee the CPU will use all of it's cache, it might just reuse the same small section continuously. – Chris S Aug 09 '10 at 20:03
-
Solution is basicly the same. Generate a lot of code and execute it. – Maciej Piechotka Aug 09 '10 at 20:47
-
Newer processors are going to recognize the pattern and will not invalidate the existing cache line; so it will only use 2 (or so) lines of cache for your program. If cache is a big factor; better to just turn it off and not use it. On the other hand; it probably isn't making 2 hoots of difference in the first place. – Chris S Aug 09 '10 at 23:06
-
2I disagree with 'just turning it off'. Cache affects the optimization techniques to large extend and turning it off will affect the result. It is better to randomize technique (like random commands `nop`, `xor rax, rbx`, `add rax, rbx` etc. – Maciej Piechotka Aug 10 '10 at 15:01
-
@MaciejPiechotka Instead of creating 2 nested cycles, wouldn't be better to, for example, increment `c[j]`? With your solution you cycle 65k times over the same 20MB of data "renewing" it at each external iteration... I cannot get why. – Nicolò Ghielmetti Jul 20 '20 at 13:06