-1

I have an assignment to optimize a for loop so the compiler compiles code that runs faster. The objective is to get the code to run in 5 or less seconds, with the original run time being around 23 seconds. The original code looks like this:

#include <stdio.h>
#include <stdlib.h>

#define N_TIMES     600000
#define ARRAY_SIZE   10000

int main(void)
{
    double  *array = calloc(ARRAY_SIZE, sizeof(double));
    double  sum = 0;
    int     i;

    printf("CS201 - Asgmt 4 - I. Forgot\n");

    for (i = 0; i < N_TIMES; i++) {

        int     j;

        for (j = 0; j < ARRAY_SIZE; j++) {
            sum += array[j];
            }

        }

    return 0;
}

My first thought was to do loop unrolling on the inner for loop which got it down to 5.7 seconds and that loop looked like this:

 for (j = 0; j < ARRAY_SIZE - 11; j+= 12) {
            sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
                }

After taking it out to 12 spots in the array per loop the performance wasn't increasing anymore so my next thought was to try and introduce some parallelism so I did this:

   sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5]);
   sum1 = sum1 + (array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);

That actually ended up slowing down the code and each additional variable again slowed the code down more so. I'm not sure if parallelism doesn't work here or if I'm implementing it wrong or what but that didn't work so now I'm not really sure how I can optimize it anymore to get it below 5 seconds.

EDIT: I forgot to mention I can't make any changes to the outer loop, only the inner loop

EDIT2: This is the part of the code I'm trying to optimize for my assignment:

        for (j = 0; j < ARRAY_SIZE; j++) {
            sum += array[j];
            }

Im using gcc compiler with the flags gcc -m32 -std=gnu11 -Wall -g a04.c -o a04 All compiler optimizations are turned off

csStudent
  • 128
  • 2
  • 12
  • 1
    Your `array` is all zeros. And the program doesn't have any output. So the optimized program will be the empty one. Update: Sorry, it should actually print "CS201 - Asgmt 4 - I. Forgot\n"... – Eugene Sh. Aug 22 '17 at 21:19
  • 3
    What makes you think there's gong to be any parallelism in the 'parallel' code? – Jonathan Leffler Aug 22 '17 at 21:19
  • You don't need the outer `i` loop since it's all just repetitive additions. `for (j = 0; j < ARRAY_SIZE; j++) { sum += N_TIMES * array[j] }`. Get rid of the `for (i = 0...` loop. Of course, `array` isn't initialized so you'll get zeroes or whatever. But I assume that's not the point of your code. – lurker Aug 22 '17 at 21:24
  • @Dante you missed an obvious optimization (as recommended by information_interchange), but the process you followed in unrolling the loop and in determining the best number to unroll is a good skill to keep in your back pocket. Some day, tricks like that will be very useful for you. – Scott Mermelstein Aug 22 '17 at 21:27
  • @EugeneSh. C does not require floating-point zero to be bitwise zero. At best, the behavior is implementation defined. – EOF Aug 22 '17 at 21:34
  • 1
    You do not use this table or the result so any compile time optimisation will remove both loops :) – 0___________ Aug 22 '17 at 21:46
  • Here, it's a duplicate: [C loop optimization help for final assignment](https://stackoverflow.com/q/32000917/69809). As some answers state in the link, whoever wrote this assignment doesn't really understand what's important in algorithmic complexity, optimizing compilers, and probably programming in general. Another thread here too: [Optimized sum for an array of doubles](https://stackoverflow.com/q/31918680/69809). – vgru Aug 22 '17 at 21:48
  • Does the inner loop have to stay a loop? Your unrolling indicates that not. In that case use a flag to notice execution in the first time through the outer loop, execute the inner loop, store the result and in all following execution just add the stored sum. I.e. do not bother about optimising the first time, optimise all others by 99.99%. – Yunnosch Aug 22 '17 at 22:03

2 Answers2

5

Since j and i don't depend on one another, I think you can just do:

for (j = 0; j < ARRAY_SIZE; j++) {
    sum += array[j];
}

sum *= N_TIMES
takendarkk
  • 3,347
  • 8
  • 25
  • 37
information_interchange
  • 2,538
  • 6
  • 31
  • 49
-1

You can move the declaration of variable 'j' out of the loop like so:

 int j;
 for (i = 0; i < N_TIMES; i++) {

    //int     j; <-- Move this line out of the loop
    for (j = 0; j < ARRAY_SIZE - 11; j+= 12) {
        sum = sum + (array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7] + array[j+8] + array[j+9] + array[j+10] + array[j+11]);
    }
 }

You don't need to declare a new variable 'j' each time the loop runs.

Devin L.
  • 437
  • 2
  • 12
  • This is most unlikely to affect performance. – Jonathan Leffler Aug 23 '17 at 00:58
  • This is worse style for zero perf benefit. If anything, declaring in an inner scope would make lifetime analysis easier for the compiler. But in practice compilers have no problem tracing variable lifetimes. Even with optimization disabled like in this silly assignment, `for ( int j = 0 ; ...)` has no downside with the way actual compilers work. They don't move the stack pointer around to reserve/free stack space for local variables when entering/leaving scopes, they still reserve the space they'll need once on entry to the function. (Unless you use VLAs or `alloca`.) – Peter Cordes Jul 15 '22 at 00:30