I have the following C++ code snippet (the C++ part is the profiler class which is omitted here), compiled with VS2010 (64bit Intel machine). The code simply multiplies an array of floats (arr2
) with a scalar, and puts the result into another array (arr1
):
int M = 150, N = 150;
int niter = 20000; // do many iterations to have a significant run-time
float *arr1 = (float *)calloc (M*N, sizeof(float));
float *arr2 = (float *)calloc (M*N, sizeof(float));
// Read data from file into arr2
float scale = float(6.6e-14);
// START_PROFILING
for (int iter = 0; iter < niter; ++iter) {
for (int n = 0; n < M*N; ++n) {
arr1[n] += scale * arr2[n];
}
}
// END_PROFILING
free(arr1);
free(arr2);
The reading-from-file part and profiling (i.e run-time measurement) is omitted here for simplicity.
When arr2
is initialized to random numbers in the range [0 1], the code runs about 10 times faster as compared to a case where arr2
is initialized to a sparse array in which about 2/3 of the values are zeros. I have played with the compiler options /fp
and /O
, which changed the run-time a little bit, but the ratio of 1:10 was approximately kept.
- How come the performance is dependent on the actual values? What does the CPU do differently that makes the sparse data run ~10 times slower?
- Is there a way to make the "slow data" run faster, or will any optimization (e.g vectorizing the calculation) have the same effect on both arrays (i.e, the "slow data" will still run slower then the "fast data")?
EDIT
Complete code is here: https://gist.github.com/1676742, the command line for compiling is in a comment in test.cpp
.
The data files are here: