Cache in C++, what's wrong?

Question

In the following code, why is f2() is faster then f1()?

const int n = 3;
const int m = 4096;

void f1()
{
    int mas[n][m];
    //int mas_1[n][m];
    for (int i = 0; i < n; ++i)
    {
        for (int i_1 = 0; i_1 < m; ++i_1)
        {
            mas[i][i_1] = 1;
            //mas_1[i][i_1] = 1;
        }
    }
}

void f2()
{
    int mas[n][m];
    //int mas_1[n][m];
    for (int i = 0; i < m; ++i)
    {
        for (int i_1 = 0; i_1 < n; ++i_1)
        {
            mas[i_1][i] = 1;
            //mas_1[i_1][i] = 1;
        }
    }
}
int main()
{
    for (size_t a = 0; a < 10; ++a)
    {
        {
            clock_t s = clock();
            for (size_t i = 0; i < 10000; ++i)
            {
                f1();
            }
            clock_t e = clock();
            printf("T1: %d\n", e - s);
        }
        {
            clock_t s = clock();
            for (size_t i = 0; i < 10000; ++i)
            {
                f2();
            }
            clock_t e = clock();
            printf("T2: %d\n", e - s);
        }
    }
}

Optimisation is disable, x64 release I know about caching, but why it doesn't work here ...
// When n is greater then some N, f1() is faster then f2()
// I have i5 HQ4750 2ghz, 8mb cache

Neither function does anything observable, so both can be optimized out. — juanchopanza, Aug 15 '16 at 18:06
I think you are looking for [this](http://stackoverflow.com/questions/33722520/why-is-iterating-2d-array-row-major-faster-than-column-major) — NathanOliver, Aug 15 '16 at 18:08
Show us the way you check. If you run f1() k-times, then f2() k-times, then chances are, f2() is optimized out and f1() becomes memset(). — lorro, Aug 15 '16 at 18:08
@lorro There's nothing to memset. The functions don't do anything. — juanchopanza, Aug 15 '16 at 18:11
both functions should take the array as an arg so the assignments do something external — Glenn Teitelbaum, Aug 15 '16 at 18:21
Could you use `i` and `j` like a normal person, instead of `i` and `i_1`? That would be easier to read. But more importantly, if these functions take measurable amounts of time, it means you compiled without optimization enabled (which is bogus), and that you're seeing the difference between contiguous and strided access like Nathan pointed out. — Peter Cordes, Aug 15 '16 at 18:25
I use `row` and `column` for stuff like this. Does that make me abnormal? I can't be abnormal! All those years of therapy would have been wasted! — user4581301, Aug 15 '16 at 18:43
Try to post something more, how do you can say `f2()` is *faster* than `f1()`? Some results? — BiagioF, Aug 15 '16 at 18:57
Yes, i'm disabled optimisation and use x64 release building. But, by logick row order f1() must be faster then column order f2() .. i looked the disasm code - it's same for f1() and f2() — Ivan Kamynin, Aug 15 '16 at 19:09
Benchmarking without optimization is usually totally bogus. It might possibly not be in this case because cache effects can dominate the extra instructions from an unoptimized build, but extra store-forwarding latency in one vs. the other for loop variables could be spoiling your results. See the performance links in the [x86 tag wiki](http://stackoverflow.com/tags/x86/info). — Peter Cordes, Aug 15 '16 at 19:26
Downvoting for profiling with optimizations off and not mentioning it and not justifying it in the question. **Every C++ performance question on this site asks about that, almost**. — Yakk - Adam Nevraumont, Aug 16 '16 at 00:03

Cache in C++, what's wrong?

0 Answers0