Is there exist a syntax to force C compiler to use memory operand directly ?
In the good old asm time we simply write in the instruction where to take operand - 'real' register or memory pointer (location pointed by address).
But in the intrinsics pseudo-asm for C I do not see the way to force compiler to use memory pointer in the instruction (reject to load data from memory (cache) to 'register' i.e. trash register file loaded content to cache and cause reloading with penalty).
I understand it is easy to programmer to simply write 'variable' operand to instinsic and let compiler decide if load from memory first or use it directly (if possible).
Current task: I want to calculate SAD of a sequence of 8x8 8bit blocks at AVX2 CPU with 512 byte register file (16 ymm 'registers' of 32bytes each). So it can load 8 8x8 8bit source blocks to fully fill available AVX2 register file.
I want to load source blocks in all register file and test different 'ref' locations from memory against these source blocks and each ref location only once. So I want to prevent CPU from loading ref blocks from cache to register file and use 'memory operand' in sad instruction.
With asm we simply write something like
(load all 16 ymm registers with src)
vpsadbw ymm0, ymm0, [ref_base_address_register + some_offset...]
But at the C-text with intrinsic it is
__m256i src = load_src(src_pointer);
__m256i ref = load_ref(ref_pointer);
__m256i sad_result= _mm256_sad_epu8(src, ref)
It do not have ways to point compiler to use valid memory operand like
__m256i src = load_src(src_pointer);
__m256i sad_result= _mm256_sad_epu8(src, *ref_pointer)
Or depend on the 'task size' if compiler will run out of available registers it will automatically switched to memory operand version and programmer can write
__m256i sad_result=_mm256_sad_epu8(*(__m256i*)src_pointer, *(__m256i*)ref_pointer)
and expect compiler will load one of 2 operands to register file and use next from memory ?