Say, I'm given a hypothetical machine :Hypothetical Von-Neumann.
The image says that :
- The cache is 1 kB, and the cost of fetching 1 floating point (8 bytes) is 1 CPU cycle.
- If the data required by the code is not in the cache, 1 kB of data is taken from the RAM(10MB) at the cost of 150 CPU cycles.
Now, given this machine. I want to know the specifics/rules to calcuate the CPU cycle cost of a code snippet. As an example please take this code where A
is a 1024x1024 int
matrix and has been initialized with integers :
#define N 1024
sum=0;
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
sum+=A[i][j];
}
- How do I go about calculating the no. of CPU cycles required to fetch the data of matrix A? I'm
confused as to how the matrix
A
will be fetched from the main memory. I'm not looking for an exact answer, just want to know the procedure to go about figuring it out. I'm not fully sure about how the various memory are utilized by the code.
For example, when i = 0
and j = 0
, (the first iteration), A will be called from main memory right? So would that mean 1 kB of data being transferred from main memory into cache or only 4 Byte since that element is just an integer? And what about the memory for the instructions or operations? Just confused about this.
- What if I replace
A[i][j]
withA[j][i]
above? - Also, if the exact same code is written in FORTRAN, what difference would it make?
EDIT : I just wanna know how to calculate CPU cycles for fetching only the data for matrix A.