I'm reading What Every Programmer Should Know About Memory pdf by Ulrich Drepper. At the beginning of part 6 theres's a code fragment:
#include <emmintrin.h>
void setbytes(char *p, int c)
{
__m128i i = _mm_set_epi8(c, c, c, c,
c, c, c, c,
c, c, c, c,
c, c, c, c);
_mm_stream_si128((__m128i *)&p[0], i);
_mm_stream_si128((__m128i *)&p[16], i);
_mm_stream_si128((__m128i *)&p[32], i);
_mm_stream_si128((__m128i *)&p[48], i);
}
With such a comment right below it:
Assuming the pointer
p
is appropriately aligned, a call to this function will set all bytes of the addressed cache line toc
. The write-combining logic will see the four generated movntdq instructions and only issue the write command for the memory once the last instruction has been executed. To summarize, this code sequence not only avoids reading the cache line before it is written, it also avoids polluting the cache with data which might not be needed soon.
What bugs me is the that in comment to the function it is written that it "will set all bytes of the addressed cache line to c" but from what I understand of stream intrisics they bypass caches - there is neither cache reading nor cache writing. How would this code access any cache line? The second bolded fragment says sotheming similar, that the function "avoids reading the cache line before it is written". As stated above I don't see any how and when the caches are written to. Also, does any write to cache need to be preceeded by a cache write? Could someone clarify this issue to me?