If address-B access tends to be followed by address-C access, can the cache recognize it?

Question

I have a memory access pattern in my program like ...

b1->c1  (b and c are address.)
//.... do something else ....
b2->c2
//.... do something else ....
b3->c3
....

Is the compiler/cache/CPU smart enough to recognize that :
when I load b, it should (prepare to) load corresponding c?

More specifically : Can it somehow predict my access pattern and optimize it in some ways?
How much the advantage would be, roughly?

I created a test case. The result shows that it can't learn at run time.
(In real cases, B has a lot of fields but tend to -> only c.)

class C{
    public: int data=0;
};
class B{
    public: C* c;   int accu=0;
    public: B(){
        c=new C();
    }
    public: void doSomething(){
        accu+=c->data; //do something about c
    }
};
int main() {
    using namespace std;
    const int NUM=1000000;
    B* bs[NUM];
    for(int n=0;n<NUM;n++){
        bs[n]=new B();
    }
    for(int loop=0;loop<20;loop++){
        double accumulator=0;
        for(int n=0;n<NUM;n++){
            int iSecret = rand() % NUM;
            clock_t begin = clock();
            bs[iSecret]->doSomething();
            clock_t end = clock();
            accumulator+=double(end - begin);
        }
        double elapsed_secs = accumulator;
        std::cout<<elapsed_secs<<std::endl;
    }
}

Print (time per loop)

If it can learn, later loops should use less time than previous ones.

I don't think it can utilize Spatial locality, because c's address is far away.

what you expect by *it can learn*? Always load c with b would not be more efficient in recent CPU pipeline AFAICT. — apple apple, Feb 23 '17 at 03:36

score 1 · Accepted Answer · answered Feb 23 '17 at 07:59

1

In your case bs[iSecret] is one address which tries to access some other address c via doSomething()

This is user level logic which user only can optimize by appropriately placing the data pointed by your b and c so as to take advantage of spatial locality.

As a simple example, would you expect compiler to optimize this code?

int a[100][100];
for(int i = 0; i < 100; ++i)
for(int j = 0; j < 100; ++j)
    cout << a[j][i] << endl;

However, would it had been a case of conditional construct like

address X:  if(condition)
            {
address Y:     //dosomething_A
            }
            else
            {
address Z:    //dosomething_B
            }

Here, if condition is at address X and so on..

In such conditional constructs the compiler can generate code which can minimize the penalties of stall cycle(due to branch) on pipelined processors.

Also, pipelined processors can learn about your branches using Branch_predictor at runtime.

answered Feb 23 '17 at 07:59

sameerkn

2,209
1
12
13

I am not so good at English, may you answer in a non-question statement, please? Thank! – javaLover Feb 23 '17 at 08:39
All am saying is user level logic cannot be optimized/predicted by compiler/processor. But, compiler can optimize the programming language constructs since it know about those constructs. Similarly CPU can optimize/predict the instruction flow because it deals with those things. – sameerkn Feb 23 '17 at 08:55
I am trying to understand it. Does your answer mean: "The answer is no. I should make b and c have similar address. To achieve it, I have to control manually, e.g. by your first example." ? – javaLover Feb 23 '17 at 09:02
1

Sorry, if my post confused you. Correct, answer is no i.e compiler/cpu **cannot** optimize/predict the code which posted. You need to manually control. – sameerkn Feb 23 '17 at 09:06
Thank. What make you believe in that way? In other words, may you provide some references/evidence please? It is easy to find links show the compiler/CPU can be smart in a certain way, but it is hard to prove that a certain optimization does not exist. – javaLover Feb 23 '17 at 09:08
1

In the code you posted, all your `b1 and c1`, `b2 and c2`.... are dynamically allocated so all these might be stored on different location inside virtual address space of process and hence on different virtual pages and hence on different memory `frame`. In normal paging system at OS level things are loaded as per demand. So if one page has all of your `b1, b2...` then OS won't automatically load the pages containing corresponding `c1, c2...` unless you call `dosomething()`. – sameerkn Feb 23 '17 at 09:17

If address-B access tends to be followed by address-C access, can the cache recognize it?

1 Answers1