2

I was presented example of a loop which should be slower than the one after this:

for (i = 0; i < 1000; i++) 
   column_sum[i] = 0.0;
     for (j = 0; j < 1000; j++)
        column_sum[i] += b[j][i];

Comparing to this one:

for (i = 0; i < 1000; i++)
     column_sum[i] = 0.0;
for (j = 0; j < 1000; j++)
     for (i = 0; i < 1000; i++)
        column_sum[i] += b[j][i];

Now, I coded a tool to test number of different index numbers, but I am not seeing much of performance advantage there after I tried this concept, and I'm afraid that my code has something to do with it...

Should be slower loop that works within my code:

    for (i = 0; i < val; i++){
        column_sum[i] = 0.0;
        for (j = 0; j < val; j++){
            int index = i * (int)val + j;
            column_sum[i] += p[index];
        }
    }

Should be "significantly" faster code:

    for (i = 0; i < val; i++) {
        column_sum[i] = 0.0;
    }
    for (j = 0; j < val; j++) {
        for (i = 0; i < val; i++) {
            int index = j * (int)val + i;
            column_sum[i] += p[index];
        }
    }

Data comparison:

enter image description here

HelpNeeder
  • 6,383
  • 24
  • 91
  • 155
  • you are saying that after you have coded the tool, the second one should be faster? – Haris Sep 15 '14 at 09:12
  • Yes, my guess is that my 2 loops have different indexes, i just picked the better working for each (`int index = j * (int)val + i;` vs `int index = i * (int)val + j;`.) My understanding is that the tool can not only modify the order of loops, but also the indexes. One's for row, and other for column data access... Row's faster than Column, so which index should be appropriate? – HelpNeeder Sep 15 '14 at 09:18
  • 3
    Are you aware that in the first example there are no curly braces so the indentation may not match the block structure of the code? There is no nested loop and it's no surprise it's quicker than the nested loops. Is this an oversight? – Jens Sep 15 '14 at 09:19
  • Jens, I'm aware, but where would `i` come from in `column_sum[i] += b[j][i];`? I assumed it was an error. First 2 loops are copy/paste what i was presented and assumed it was error. – HelpNeeder Sep 15 '14 at 09:20
  • @HelpNeeder The value for `i` is the one after the first loop was left, and that's 1000 and used over and over in the loop over j. – Jens Sep 15 '14 at 09:23
  • 1
    why would the 2nd code be faster, you are redundantly running the "column_sum[i] = 0.0;" outside in another loop rather than doing it inside the same loop in code 1 – Haris Sep 15 '14 at 09:27
  • Jens, about second loop: `In this case, the matrix is traversed in a row-order and performance can be expected to be significantly better.` – HelpNeeder Sep 15 '14 at 09:33
  • look at http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops – mch Sep 15 '14 at 09:37
  • What optimisations did you turn on in the compiler? That makes a massive difference. – gnasher729 Sep 15 '14 at 09:41
  • @HelpNeeder i think this will be the proper code `int index = j * (int)val + i;` – Haris Sep 15 '14 at 09:54
  • Nope, the loop code is right, just my index wasn't right. Thanks anyways, guys. – HelpNeeder Sep 15 '14 at 15:47
  • for some clarity... are you saying that 'val' is a value read from the user? are you using something like 'malloc' to have the right size of area available for 'column_sum'? – user3629249 Sep 16 '14 at 11:50
  • since all the calculated values are integer, why is the column_sum defined as float? In general, float operations are may times slower than integer operations. if 'column_sum' is defined as integer, then the required area can be acquired via 'calloc' and preinitialized to 0. This will be much faster, will eliminate one of your code loops, and eliminate the inclusion of many float library routines. – user3629249 Sep 16 '14 at 11:56
  • The line: int index = j * (int)val + i; inside the inner loop will result in an entry being added/removed from the stack, many times for every cell in the column_sum array. I would strongly suggest moving that line before the first line of the outer loop. Better yet, eliminate the int index = j * (int)val + i; line altogether and place the calculation as part of the setting of the column_sum[] entries. – user3629249 Sep 16 '14 at 12:00
  • The purpose of this program was to demonstrate the speed of the loops. And we are also comparing ROW vs COLUMN data access ways. – HelpNeeder Sep 16 '14 at 16:17

1 Answers1

0

I had confused the Index values in the loops: int index = j * (int)val + i;

Slower loop:

    for (i = 0; i < val; i++) {
        column_sum[i] = 0.0;
        for (j = 0; j < val; j++){
            int index = j * (int)val + i;
            column_sum[i] += p[index];
        }
    }

Faster loop:

    for (i = 0; i < val; i++) {
        column_sum[i] = 0.0;
    }
    for (j = 0; j < val; j++) {
        for (i = 0; i < val; i++) {
            int index = j * (int)val + i;
            column_sum[i] += p[index];
        }
    }
HelpNeeder
  • 6,383
  • 24
  • 91
  • 155