I have an assignment to optimize a for loop so the compiler compiles code that runs faster. The objective is to get the code to run in 5 or less seconds, with the original run time being around 23 seconds. The original code looks like this:
#include <stdio.h>
#include <stdlib.h>
#define N_TIMES 600000
#define ARRAY_SIZE 10000
int main(void)
{
double *array = calloc(ARRAY_SIZE, sizeof(double));
double sum = 0;
int i;
printf("CS201 - Asgmt 4 - I. Forgot\n");
for (i = 0; i < N_TIMES; i++) {
int j;
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
}
return 0;
}
My first thought was to do loop unrolling on the inner for loop which got it down to 5.7 seconds and that loop looked like this:
for (j = 0; j < ARRAY_SIZE - 11; j+= 12) {
sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
}
After taking it out to 12 spots in the array per loop the performance wasn't increasing anymore so my next thought was to try and introduce some parallelism so I did this:
sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5]);
sum1 = sum1 + (array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
That actually ended up slowing down the code and each additional variable again slowed the code down more so. I'm not sure if parallelism doesn't work here or if I'm implementing it wrong or what but that didn't work so now I'm not really sure how I can optimize it anymore to get it below 5 seconds.
EDIT: I forgot to mention I can't make any changes to the outer loop, only the inner loop
EDIT2: This is the part of the code I'm trying to optimize for my assignment:
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
Im using gcc compiler with the flags gcc -m32 -std=gnu11 -Wall -g a04.c -o a04 All compiler optimizations are turned off