C for loops optimization

Question

I'm trying to learn how to optimize my c code, so I found some articles on the internet and remade my function so that it should execute faster. And when I compile it without optimization flags it works (second function is about 12% faster than first), but when I use it with gcc -O3 then second function is much slower (about 50%). Do you have any idea why is that? Thanks for any help.

First function:

typedef struct {
    double *data;
    int rows;
    int columns;
} Matrix;

Matrix *matrixMultiplication(Matrix *a, Matrix *b) {
    if(a->columns != b->rows)
        return NULL;
    Matrix *matrix = createMatrix(a->rows, b->columns);
    set(0, matrix);
    for(int i = 0; i < matrix->rows; i++) {
        for(int j = 0; j < a->columns; j++) {
            for(int k = 0; k < b->columns; k++) {
                matrix->data[i * matrix->columns + k] += a->data[i * a->columns + j] * b->data[j * b->columns + k];
            }
        }
    }
    return matrix;
}

Second function:

typedef struct {
    float *data;
    unsigned int rows;
    unsigned int columns;
} Matrix_2;

unsigned int matrixMultiplication_2(Matrix_2 *a, Matrix_2 *b, Matrix_2 **c) {
    Matrix_2 *matrix;
    if(a->columns != b->rows)
        return 0;
    createMatrix_2(a->rows, b->columns, &matrix);
    set_2(0, matrix);
    for(unsigned int i = matrix->rows; i--;) {
        for(unsigned int j = a->columns; j--;) {
            for(unsigned int k = b->columns; k--;) {
                matrix->data[i * matrix->columns + k] += a->data[i * a->columns + j] * b->data[j * b->columns + k];
            }
        }
    }
    *c = matrix;
    return 1;
}

If you want to compare two different algorithms of the same functionality, then make sure you use the same data and data-structures. Otherwise comparison is moot. — Some programmer dude, Jul 15 '18 at 18:09
I use it with same data, but I changed data-structures because I read I should rather use floats and unsigned ints when it's possible — Paprikadobi, Jul 15 '18 at 18:13
@PatrikDobiáš Since you are using float for one and double for the other you are not using the same data. — klutt, Jul 15 '18 at 18:38
@you should use proper types. For example for indexes and sizes you have `size_t`. On modern systems you may get the penalty when using floats as they may have to be converted to doubles or even wider types. — 0___________, Jul 15 '18 at 18:40
@user58697 a is 100 rows x 500 columns, b is 500 rows x 100 columns — Paprikadobi, Jul 15 '18 at 19:09
@klutt even when I use same struct for both it almost doesn't change execution time — Paprikadobi, Jul 15 '18 at 19:10
@PeterJ_01 thanks for advice with ` size_t ` it actually helped and reduced execution time of second function by two thirds — Paprikadobi, Jul 15 '18 at 19:13
@NickS: One could just as well reason that preincrement is slower because the program has to wait for the processor to perform the increment, whereas with postincrement the processor can immediately use the value while an add instruction is in progress to perform the increment (with its destination set to another register). In C++, preincrement may be more efficient with complicated objects for which postincrement requires making copies of them. But in C, it is useless to consider such microoptimizations without intimate knowledge of the compiler. — Eric Postpischil, Jul 15 '18 at 19:23
@EricPostpischil Oh, thanks for the clarification, it is useful information. I should read more about that. — Nick S, Jul 15 '18 at 19:26
next step for speed might be to use unrolling https://github.com/deuxbot/fast-matrix-multiplication — francek, Jul 15 '18 at 19:37
Do not optimize until you have measured and are sure there is a problem and that you are optimizing the correct thing. Don Knuth stated: "So when Prof. Knuth said "Premature optimization is the root of all evil" he meant don’t lose readability of your code before it is absolutely necessary." I was hired at IBM on the basis of the answer to one question: How would I optimize someone code. I said I would first measure the code timing, I did not get past that sentence when I was cut off and hired. — zaph, Jul 15 '18 at 23:28
@francek Ok, I tried that code in the link and I wonder why is `mxm_naive` slower than `mxm_block`, when you still have to do same number of steps but you use more for loops for it — Paprikadobi, Jul 16 '18 at 05:19
very good question Paprikadobi..I dont know and unfortunately dont have time to investigate..I would expect it has sth to do with how CPU/registers/addressing work ?..i just searched for "C matrix multiplication" and this code seemed to be helpful..I did some opt in the past and unrolling was pretty effective method..https://stackoverflow.com/questions/23253358/optimizing-arm-cortex-m3-code/23598850#23598850 — francek, Jul 16 '18 at 07:33

score 5 · Answer 1 · answered Jul 15 '18 at 19:38

That's because compiler optimizations are based on pattern recognition. Your compiler knows a ton of typical code patterns, and knows how to transform them to yield faster code. However, while this library of code patterns is quite extensive, it's finite.

The first function uses the canonical for(int i = 0; i < count; i++) loop control. You can bet that any compiler worth its salt has a pattern for this, yielding close to optimal code for the loop control.

The second function uses a pattern that's rarely seen in human code. While I personally like this pattern for its brevity, there are many programmers out there that find it too cryptic to be used. Obviously, your compiler does not come with an optimizer pattern for this, so the resulting code does not get fully optimized.

Optimizations like replacing for(int i = 0; i < count; i++) with for(int i = count; i--;) were useful when C was still little more than a high-level assembler. But compiler optimizations have long turned code translation into a much too complicated beast to be optimized by such tricks. Today, most optimizations need to be done on the algorithmic level. Translation level optimizations should generally be left to the compiler and fostered by writing canonical code that the compiler can optimize.

Yes, compilers are quite good. I once wrote some crypto code in "C" and compiled with optimizations on. Then, being an assembler programmer for years, I looked at the generated code. Yes I could save one instruction because I could use a register that is generally reserved, maybe got a speed increase of 2%. Since this code ran 24/7 for months at a time that was probably well spent. In general not worth the effort, the compiler was terrific at optimizations. — zaph, Jul 15 '18 at 23:36

C for loops optimization

1 Answers1