0

I learned there are 2 different operations related to cache on arm64, there are 'clean' and 'invalidate'. The 'clean' operation means flushing data into memory and invalidating cacheline, while 'invalidate' only invalidating cacheline.

I also learned there are 2 APIs to perform these operations on dcache, they are '__flush_dcache_area' (using instruction 'DC CIVAC') and '__inval_dcache_area' (using instruction 'DC IVAC').

From my understanding, the 'invalidate' operation should be faster than 'clean', sine it does not need to operate on real memory. 'invalidate' just marks a flag in cacheline, and 'clean' has to operate on real memory and then mask a flag in cacheline. So I believe 'clean' do much more things than 'invalidate'. To verify this, I wrote some test code in a kernel module as below:

#include <linux/timekeeping.h>
#include <linux/vmalloc.h>
#include <asm/cacheflush.h>
static int mem_size = 8 * 1024;
module_param_named(msize, mem_size, int, 0644);
static int cache_op = 0;
module_param_named(op, cache_op, int, 0644);
static int op_cnt = 100;
module_param_named(cnt, op_cnt, int, 0644);

static void call_cache_time_routine(void)
{
    u64 i, j, t1, t2, total_time = 0;
    unsigned char *mem = vmalloc(mem_size);
    if (!mem) {
        printk("Failed to vmalloc!\n");
        return;
    }   

    for (j = 0; j < op_cnt; j++) {
        memset(mem, 0xa5, mem_size);
    
        for (i = 0; i < mem_size; i++) {
            mem[i] = (mem_size + i) % 256;
        }   
    
        t1 = ktime_get_ns();
    
        if (cache_op == 0) {
            __flush_dcache_area(mem, mem_size);
        } else {
            __inval_dcache_area(mem, mem_size);
        }   
    
        t2 = ktime_get_ns();
        total_time += t2 - t1; 

        printk("mem size %d, op %s, time consumed %lluns, mem[j] = %d\n",
            mem_size, cache_op == 0 ? "flush" : "inval", t2 - t1, mem[j % mem_size]);
    }

    kvfree(mem);

    printk("Average %lluns per time\n", total_time / op_cnt);
}

However, the test result is a surprise to me. The time spent on 'invalidate' is even a little bit longer than 'clean' (it is something like 1001ns VS. 999ns, the gap is not so big). In order to exclude the possible factors that may affect the cache's action, I turned off all the user programs except for above test code. I also tested buffer size from 1KB to 8MB, and the results were more or less the same. I even tried using a mb() call before the cache operation, but it didn't help.

Can anyone explain this to me. Why does 'invalidate' cost more (or let's say, more or less the same) time than 'clean'.

Thanks in advance ~

0 Answers0