0

As per my understanding, when we use clflush(&Array1[i]), then we actually manually evict the cache line where this Array1[i] resides and it is guaranteed that the element ,Array1[i] will not present in cache and next time after clflush when we try to access the element , Array1[i] ,it needs to be loaded from higher level of cache and thus higher access time as compared to access time before clflush.

Is there a way to check whether the processor cache has been flushed recently?

took 81 ticks

took 81 ticks

flush: took 387 ticks // result is as expected

took 72 ticks

Now, Suppose, Array1[i] maps to cache set 'P' ( W=8, way of associativity), so I am assuming here 8 elements of Array1, is loaded into all 8 cache lines of same set 'P'. I have loaded them into 8 blocks somehow by accessing other elements of Array1.

Let, other array element, Array2[j] maps to ONLY one block of set 'P'.

here is my questions,

Question 1 :

if I access Array2[j] element then as per my understanding of LRU cache replacement strategy, one element of Array1, will be evicted from one block of set 'P' and will make room for new element Array2[j] which also maps to same set. Here selection of cache line/block is determined according to LRU policy.

Am I correct ?

Question2 : if the Array1, element is evicted from the cache to make room for Array2[j] element , will it be stored in Victim cache (both Array1 and Array2 are in cache L2 ) , and next time it will be loaded from that victim cache OR the Array1 element will loaded from higher cache, not from victim cache ?

Question3 :

If Array1 is evicted to make room for Array2[j] element, so if we find access time for the Array1 element , will we get higher time for it always as compared to access time before accessing Array2[j] element ?

If I replace clflush command, by access of Array2[j] which also does the same as clflush, flushing of Array1 element from that cache lines, then I am getting the result for the cache lines where Array1 element is replaced by access of Array2[j] element

took 81 ticks

took 81 ticks

****ACCESS ARRAY2[j] element: took 75 ticks**

took 72 ticks

**I am confused ,ACCESS ARRAY2[j] element: took 75 ticks**** WHY it is less ? what is correct answer, I am expecting higher access time for the block which is evicted by Array2 element.

I am using Core i7 machine and use of correct rdtsc is here, clflush() in i3 or i7 processors

Can anyone explain , Thanks in advance

Community
  • 1
  • 1
bholanath
  • 1,699
  • 1
  • 22
  • 40
  • 1
    As I explained in one of the previous versions of this question - you *can't* measure a single cache access with rdtsc, the overhead (especially when serializing with cpuid) is huge compared to the actual time you measure. Measure this once over all the sets and then check if it makes sense. – Leeor Oct 22 '13 at 18:53
  • @Leeor, first of all thanks . I want to know , what do you mean by overhead ? as rdtsc "It counts the number of cycles since reset" so if I take difference End-Start, what is the overhead I am measuring as per your comment? Can you explain it please . – bholanath Oct 22 '13 at 19:35
  • Another point, if I access the same element multiple times to reduce overhead as per your comment on previous versions of question, I am sure after 1st access it will be loaded into L1 cache and access time will be of access time of L1 cache as the element is already loaded into cache. So What is the use of multiple access after loading into cache ? – bholanath Oct 22 '13 at 19:38
  • 1
    There's at least one cpuid between the rdtsc's, that's a long serializing flow. And I didn't say access the same element - you've shown how to check set[p], move on to set p+1, p+2, .. – Leeor Oct 22 '13 at 19:47

0 Answers0