I'm putting together a small patch for the cachegrind/callgrind tool in valgrind which will auto-detect, using completely generic code, CPU instruction and cache configuration (right now only x86/x64 auto-configures, and other architectures don't provide CPUID type configuration to non-privileged code). This code will need to execute entirely in a non-privileged context i.e. pure user mode code. It also needs to be portable across very different POSIX implementations, so grokking /proc/cpuinfo won't do as one of our destination systems doesn't have such a thing.
Detecting the frequency of the CPU, the number of caches, their sizes, and even cache line size can all be done using 100% generic POSIX code which has no CPU-specific opcodes whatsoever (just a lot of reasonable assumptions, such as that adding two numbers together, if without memory or register dependency stalls, probably will be executed in a single cycle). This part is fairly straightforward.
What isn't so straightforward, and why I ask StackOverflow, is how to detect cache line associativity for a given cache? Associativity is how many places in a cache can contain a given cache line from main memory. I can see that L1 cache associativity could be detected, but L2 cache? Surely the L1 associativity gets in the way?
I appreciate this is probably a problem which cannot be solved. But I throw it onto StackOverflow and hope someone knows something I don't. Note that if we fail here, I'll simply hard code in an associativity default of four way, assuming it wouldn't make a huge difference to results.
Thanks,
Niall