10

On i386 linux. Preferably in c/(c/posix std libs)/proc if possible. If not is there any piece of assembly or third party library that can do this?

Edit: I'm trying to develop test whether a kernel module clear a cache line or the whole proccesor(with wbinvd()). Program runs as root but I'd prefer to stay in user space if possible.

Roman A. Taycher
  • 18,619
  • 19
  • 86
  • 141
  • 2
    Do you mean the processor cache? And how much time exactly is recently? – Gunther Piez May 18 '11 at 07:31
  • A lot less then a second(I know thats not very exact). Basically I want to test whether a fairly quick function flushed the cache. – Roman A. Taycher May 18 '11 at 07:39
  • 1
    Question makes no sense: (a) there are typically multiple caches, (b) cache lines are being evicted all the time, (c) "flushing" the cache (i.e. evicting all cache lines) is not something that will normally ever happen, (d) in general the CPU has no knowledge of what is going on in any of the caches – Paul R May 18 '11 at 07:53
  • It might not be common but flushing is something that is done in some places,no? – Roman A. Taycher May 18 '11 at 08:04
  • 2
    @Roman: yes, `WBINVD` flushes all the data caches (it's not clear whether it also flushed the L1 instruction cache, and its implementation is CPU-depdendent), but it's pretty unusual to use this instruction (self-modifying code is the only example that comes to mind) and even so, the caches will immediately start to fill again and the CPU has no direct knowledge of the current state of any of the caches. You should explain what it is that you are *really* trying to achieve, i.e. the motivation behind your question. – Paul R May 18 '11 at 08:06
  • whether a low level caching scheme is flushing the cache under certain circumstances. – Roman A. Taycher May 18 '11 at 08:36
  • Does `opcontrol --list-events` show any interesting event? – ninjalj Jul 02 '11 at 10:40

3 Answers3

12

Cache coherent systems do their utmost to hide such things from you. I think you will have to observe it indirectly, either by using performance counting registers to detect cache misses or by carefully measuring the time to read a memory location with a high resolution timer.

This program works on my x86_64 box to demonstrate the effects of clflush. It times how long it takes to read a global variable using rdtsc. Being a single instruction tied directly to the CPU clock makes direct use of rdtsc ideal for this.

Here is the output:

took 81 ticks
took 81 ticks
flush: took 387 ticks
took 72 ticks

You see 3 trials: The first ensures i is in the cache (which it is, because it was just zeroed as part of BSS), the second is a read of i that should be in the cache. Then clflush kicks i out of the cache (along with its neighbors) and shows that re-reading it takes significantly longer. A final read verifies it is back in the cache. The results are very reproducible and the difference is substantial enough to easily see the cache misses. If you cared to calibrate the overhead of rdtsc() you could make the difference even more pronounced.

If you can't read the memory address you want to test (although even mmap of /dev/mem should work for these purposes) you may be able to infer what you want if you know the cacheline size and associativity of the cache. Then you can use accessible memory locations to probe the activity in the set you're interested in.

Source code:

#include <stdio.h>
#include <stdint.h>

inline void
clflush(volatile void *p)
{
    asm volatile ("clflush (%0)" :: "r"(p));
}

inline uint64_t
rdtsc()
{
    unsigned long a, d;
    asm volatile ("rdtsc" : "=a" (a), "=d" (d));
    return a | ((uint64_t)d << 32);
}

volatile int i;

inline void
test()
{
    uint64_t start, end;
    volatile int j;

    start = rdtsc();
    j = i;
    end = rdtsc();
    printf("took %lu ticks\n", end - start);
}

int
main(int ac, char **av)
{
    test();
    test();
    printf("flush: ");
    clflush(&i);
    test();
    test();
    return 0;
}
Ben Jackson
  • 90,079
  • 9
  • 98
  • 150
  • Thanks this looked good at first but ended up being too inaccurate. possibly due to problems in my code. – Roman A. Taycher Jul 27 '11 at 10:52
  • @Ben Is there any special requirement (compilation, system settings, etc.) that has to be met to make this code work? I receive in the line "flush:" ticks in the same order of magnitude like the cached accesses. I'm using macOS 10.15.5 with gcc-9 and clang 11 (both tested). – Patrick Jun 19 '20 at 09:33
3

I dont know of any generic command to get the the cache state, but there are ways:

  1. I guess this is the easiest: If you got your kernel module, just disassemble it and look for cache invalidation / flushing commands (atm. just 3 came to my mind: WBINDVD, CLFLUSH, INVD).
  2. You just said it is for i386, but I guess you dont mean a 80386. The problem is that there are many different with different extension and features. E.g. the newest Intel series has some performance/profiling registers for the cache system included, which you can use to evalute cache misses/hits/number of transfers and similar.
  3. Similar to 2, very depending on the system you got. But when you have a multiprocessor configuration you could watch the first cache coherence protocol (MESI) with the 2nd.

You mentioned WBINVD - afaik that will always flush complete, i.e. all, cache lines

flolo
  • 15,148
  • 4
  • 32
  • 57
  • I have the source code, I'm trying to write a test to confirm what happens under certain conditions. Atom both dual and single core. – Roman A. Taycher Jul 03 '11 at 00:18
  • If you have the source code, can't you just look through it to check if the WBINVD instruction is ever issued? – jalf Jul 07 '11 at 13:36
  • +1 for the pc ideas -- Performance counters could deliver the info the question asks for. – TheBlastOne Jul 07 '11 at 21:37
  • jalf: 1) I'm trying to do semi automated testing, not code review (and the code could change hopefully the test will still work). 2) WBINVD is invoked in some places based on policy. I'm trying to test different conditions to see if it gets invoked. – Roman A. Taycher Jul 08 '11 at 03:45
0

It may not be an answer to your specific question, but have you tried using a cache profiler such as Cachegrind? It can only be used to profile userspace code, but you might be able to use it nonetheless, by e.g. moving the code of your function to userspace if it does not depend on any kernel-specific interfaces.

It might actually be more effective than trying to ask the processor for information that may or may not exist and that will be probably affected by your mere asking about it - yes, Heisenberg was way before his time :-)

thkala
  • 84,049
  • 23
  • 157
  • 201