I made a C++ code with dumb processing just to test a few cache optimizations to study code improvements and stepped into something very weird...
Using the arrays as static or declaring outside the main function as global the code runs in 0.5 seconds (average) and if I just move the arrays to inside of the main function, the same processing runs in 15 seconds (average). I can't find why, and I'm just finding articles talking about how local variables are faster than locals.
Does someone have any idea what's happening? I'm compiling with C++ in Windows, installed using the MingW, and running the code on a Desktop with i3-7100.
EDIT:
- The goal is not the speed improvements, is just some tests to study the use of the cache, moving, removing or merge arrays. The optimization flags are changing the code to the perfect code and perfect speed in fact. But whats it's changing? Whats was wrong in the arrays location that is being fixed by the flags?
- I'm just compiling with g++ -o
EDIT 2:
I made the array initializations like some comments suggested and the not initialized is in fact the slower, and the initialized is the same speed of the global code. Why? What's happenning under the hood?
EDIT 3:
After some suggestions in the comments and explanations I made some testing with a fair test enviroment created by @paddy and shared in the comments: https://godbolt.org/z/8ev85qjP5
Code:
#include <chrono>
#include <iostream>
#define TAM 10
#define N 10000
#ifdef USE_GLOBAL
volatile double output[N] = {}, values[N] = {}, error[N] = {};
#endif
int main()
{
#ifndef USE_GLOBAL
volatile double output[N] = {}, values[N] = {}, error[N] = {};
#endif
std::cout << "Starting" << std::endl;
auto t1 = std::chrono::high_resolution_clock::now();
{
for (int total = 0; total < TAM; total++) {
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
output[i] += (values[j] + error[j]) / i + 1;
}
}
}
}
auto t2 = std::chrono::high_resolution_clock::now();
auto duration =
(std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1)
.count());
float time = (float)duration / 1000000;
std::cout << "Processing time = " << time << " seconds."
<< std::endl;
}