0

I would like to do some testing on cache for my x86 IA32 Intel CPU.

Referred to the below document, am newbie to coding in assembly and also new to cache concepts, so I am in need of help for the same.

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

I would like to enable the cache, invalidate, writeback, writethrough and cache disable.

Can you please help me with inline C assembly code ?

I come across some asm instruction, clflush, wbinvd, not sure when to use and how to use.

Also how can I verify the cache enable/disable/invalidate/writeback functions.

Went through the following post and it seems to be x86 64bit. Most of assembly instructions are not matched.

enable/disable cache on intel 64bit machine: CD bit always set?

int cache_test(int opt) {

        unsigned int cr0;

        switch(opt) {
        case 0:
            __asm__ volatile(
                "pushl %%eax\n\t"
                "movl %%cr0,%%eax\n\t"
                "orl $0x60000000,%%eax\n\t"
                "movl %%eax,%%cr0\n\t"
                "movl %%cr0, %0\n\t"
                "wbinvd\n\t"
                "popl  %%eax"
                : "=r"(cr0)
                :
                :);
        printf("printf: disable cache cr0 0x%x\n", cr0);
        break;

        case 1:
            __asm__ volatile(
                "pushl %%eax\n\t"
                "movl %%cr0,%%eax\n\t"
                "andl $0x9fffffff,%%eax\n\t"
                "movl %%eax,%%cr0\n\t"
                "movl %%cr0, %0\n\t"
                "popl  %%eax"
                : "=r"(cr0)
                :
                :);
        printf("printf: enable cache; cr0 0x%x\n", cr0);
        break;

        case 2:
           __asm__ volatile(
                "pushl %%eax\n\t"
                "movl %%cr0, %%eax\n\t"
                "movl %%eax, %0\n\t"
                "popl %%eax"
                : "=r"(cr0)
                :
                :);
        printf("printf: XENMEM_show_cache_status cro value is 0x%x\n", cr0);
        return (long)cr0;
    }

        return cr0;
}

Ported the code to 32bit IA CPU. Is this looks good to enable and disable the cache ?

Titus
  • 43
  • 2
  • 10
  • Have you read the descriptions of these instructions and controls in the Intel SDM? If so, please tell us what parts of the explanations you are having trouble understanding. – prl Apr 11 '18 at 04:20
  • The 64-bit assembly code in the question you linked can be converted to 32-bit code by simply changing rax to eax. (Also, remove the upper 32 bits of the mask constant.) – prl Apr 11 '18 at 04:22
  • Hi prl, thanks for your reply. Yes I ported that code and want to know that whether I have ported code correctly, and also can you please let me know how to confirm that cache is enabled/disable bases on the performance, is it any assembly instruction or reading PC value will help to check the CPU performance before and after cache enable. Thanks again. – Titus Apr 11 '18 at 05:32
  • *Also how can I verify the cache enable/disable/invalidate/writeback functions.* With performance tests. Easy for enable/disable. – Peter Cordes Apr 11 '18 at 05:44
  • Just declare a clobber on `eax` like a normal person, or even better use `%0` as your scratch reg. That push/pop to save/restore `eax` is a total waste of instructions. (I know you just ported it from the 64-bit question; I commented the same thing there.) – Peter Cordes Apr 11 '18 at 05:50
  • Titus, use @prl to notify users when you reply to them. – Peter Cordes Apr 11 '18 at 05:51
  • @PeterCordes Thanks for suggestions, its really useful. How we can test the performance before and after cache enable ? I tried 'rdpmc' instruction to check the performance, but it returns 0 all the time. Can you please help me on this ? – Titus Apr 11 '18 at 06:47
  • The difference will be *huge*, you can use `rdtsc` without worrying about frequency scaling. (Or set your governor to `performance` and disable turbo, etc.) Run a pointer-chasing loop like `mov rax, [rax]`. (or write it in C with a `volatile void*ptr_to_self = &ptr_to_self;`) – Peter Cordes Apr 11 '18 at 07:11
  • @PeterCordes `uint64_t rdtsc(){ unsigned int lo,hi; __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); return ((uint64_t)hi << 32) | lo; } cache_test(0);//Cache Disable cache_test(2);//Cache Status mem_access(); uint64_t tick = rdtsc(); printk("tick -> %lld \n", tick); cache_test(1);//Cache Enable cache_test(2);//Cache Status mem_access(); tick = rdtsc(); printk("tick 2 -> %lld \n", tick); ` Tried this code, getting 334 value both times. – Titus Apr 11 '18 at 07:52
  • GNU C has a `__builtin_rdtsc`, you don't have to code it up in inline asm yourself. Anyway, your `cache_test` doesn't include a microbenchmark, so you're only measuring the cycles to change `cr0`, not for any loads. Also, if your measurement is tiny, don't forget to use a serializing instruction like `lfence` to stop `rdtsc` from executing out-of-order. – Peter Cordes Apr 11 '18 at 07:54
  • Output : printf: disable cache! cr0=0x0 printf: cache_status! cr0=0x2 tick -> 334 printf: enable cache; cr0=0x1 printf: cache_status! cr0=0x2 tick 2 -> 334 – Titus Apr 11 '18 at 07:54
  • @PeterCordes I am accessing the system memory of the CPU ('mem_access' function) before and after cache enable, both times reading counter. – Titus Apr 11 '18 at 07:56
  • @PeterCordes I will read the performance by reading some timer registers, now I would like to do cache invalidate or writeback, how can I do this and test, and say that cache operations are working good in CPU, any asm inline code ? BTW. I am not using any Linux OS but bare metal code running on CPU via JTAG. Thanks for the help. I heard that we should access the MTRR registers to cache the system memory, I have physical memory, address 0x8000_0000, how can I enable the cache for this region using MTRR or how can I do this ? – Titus Apr 11 '18 at 10:38
  • @PeterCordes Any help ? – Titus Apr 12 '18 at 01:52
  • set up a long linked list (a couple hundred nodes) with each node in a different cache line (non sequential to defeat prefetch, but not with a large stride so you don't get conflict misses; L1d is 8-way associative on Intel). Get them hot in cache and then measure how long it takes to walk the list to the end. Then `wbinvd` to flush all caches, or `clflushopt` on each line separately, and time again. It should be *much* slower, like at least 10 times slower. Your MTRR should have the memory you're using set to WB, of course. – Peter Cordes Apr 12 '18 at 02:08
  • Thanks @PeterCordes for your help. – Titus Apr 17 '18 at 06:07

0 Answers0