Why are 1-dimensional arrays faster than Jagged arrays in C#?

Question

I was curious if a 1-Dimensional array is faster than a jagged array, and I measured the performance of the following blocks of code:

Test 1: Jagged Arrays

double[][][][] jagged = ArrayExtensions.Get4DMatrix<double>(100, 100, 50, 50, 0);
for (int iter = 0; iter < 5; iter++)
{
    sw.Restart();
    for (i = 0; i < 100; i++)
    {
        for (j = 0; j < 100; j++)
        {
            for (k = 0; k < 50; k++)
            {
                for (l = 0; l < 50; l++)
                {
                    test = jagged[i][j][k][l];
                    jagged[i][j][k][l] = test;
                }
            }
        }
    }
    Console.WriteLine("Jagged Arrays, Test {0}: {1} ms", iter, sw.ElapsedMilliseconds);
}

Test 2: Single-dimension arrays

double[] single = ArrayExtensions.Get1DArray<double>(25000000);
for (int iter = 0; iter < 5; iter++)
{
    sw.Restart();
    for (i = 0; i < 100; i++)
    {
        for (j = 0; j < 100; j++)
        {
            for (k = 0; k < 50; k++)
            {
                for (l = 0; l < 50; l++)
                {
                    test = single[i * 100 + j * 100 + k * 50 + l];
                    single[i * 100 + j * 100 + k * 50 + l] = test;
                }
            }
        }
    }
    Console.WriteLine("Single Arrays, Test {0}: {1} ms", iter, sw.ElapsedMilliseconds);
}

Running the test yields:

Jagged Arrays, Test 0: 1447 m
Jagged Arrays, Test 1: 1429 m
Jagged Arrays, Test 2: 1431 m
Jagged Arrays, Test 3: 1430 m
Jagged Arrays, Test 4: 1429 m

Single Arrays, Test 0: 386 ms
Single Arrays, Test 1: 387 ms
Single Arrays, Test 2: 386 ms
Single Arrays, Test 3: 387 ms
Single Arrays, Test 4: 387 ms

Also, I ran the tests only with assignment to the array and then only with reading from the array, and the results had the same ratios.

I was expecting that the 1-dimensional arrays were faster than jagged arrays, but I was quite surprised when I saw that the last block executes in only 27% of the execution time of the first.

Could someone explain why this huge difference occurs? Also are there any drawbacks of using 1-dimensional arrays (beside code-readability that it's obviously made harder, and maybe the increased risk of making errors)?

The code was executed in a non-optimized build. In optimized build both tests execute in under 100 ms on each iteration, but I think this has to do more with the code executed inside the loops. Still, 1-dimensional arrays are 50% faster than jagged arrays.

This will be useful. http://stackoverflow.com/questions/468832/why-are-multi-dimensional-arrays-in-net-slower-than-normal-arrays — Vimal CK, Aug 10 '13 at 19:06
why is it surprising that performing 4 "dereference a vector and resolve by index, with null and bounds checks" (`ldelem` - and `stelem` for the assign) is slower than performing 1 "dereference a vector and resolve by index, with null and bounds checks" ? — Marc Gravell, Aug 10 '13 at 19:07
@VimalCK while *true*, in this case it is a jagged array - so 4 separate vectors, not a multi-dimensional array — Marc Gravell, Aug 10 '13 at 19:08
Bear in mind that in your jagged array version there are *multiple* nullity and bounds checks. — Jon Skeet, Aug 10 '13 at 19:10

Hans Passant · Answer 1 · 2013-08-10T19:59:36.807

7

   test = single[i * 100 + j * 100 + k * 50 + l];

A wise programmer once said: "Never trust a benchmark you haven't falsified yourself". Could be unintentional, this is a pretty nasty bug in your code that makes you compare apples and oranges. The multipliers are completely wrong. The i index must be multiplied by 100*50*50, the j index by 50*50.

The side-effect is that you are much more likely to use the CPU caches effectively since you address much less memory. Makes an enormous difference, RAM is very slow.

edited Aug 10 '13 at 19:59

answered Aug 10 '13 at 19:49

Hans Passant

922,412
146
1,693
2,536

Absolutely! It could be written `single[((i * 100 + j) * 50 + k) * 50 + l]` I guess. – Jeppe Stig Nielsen Aug 10 '13 at 20:02

score 0 · Answer 2 · answered Aug 10 '13 at 19:12

Maybe because "Jagged Arrays" are array of pointers (to arrays)... In your example you have 4 levels of indirections:

jagged[i][j][k][l]

gets offset i from "jagged"
gets offset j from previous result
gets offset k from previous result
gets offset l from previous result

score 0 · Accepted Answer · answered Aug 10 '13 at 19:47

One major factor in performance is the number of data cache misses. Memory is divided into chunks called cache lines which, depending upon the machine, might be somewhere between 16-256 bytes or so. Accessing any byte of data in a cache line will cost about as much as accessing everything in it. Very recently accessed cache lines are kept in a small cache within a CPU core and can be accessed again very quickly. Lines that were not accessed recently enough to be in the first-level cache will be looked for in a second-level cache which is bigger but not as fast to access. Lines which can't be found there may be looked for in a third-level cache (and, theoretically, fourth, fifth, sixth, etc. though I don't think any machines go that far). An instruction requires data which isn't found in any cache may take many dozens of times longer to execute than one which can be satisfied using the level 1 cache.

Your program is perhaps not the best metric of the relative performance of linear versus jagged arrays because you are using entirely sequential access. This means that most accesses are going to be handled by the fastest (level 1) cache. As pspet notes, dereferencing four nested objects takes more work than computing a single offeset and using that. If everything is coming from level 1 cache, the fact that the actual data access is cheap means that this extra effort will dominate.

I would suggest that you play around with varying the order of your loops and monitor the performance. Build in "release" mode and run without a debugger attached to get accurate timing results. I would conjecture that swapping your two inner loops will slow down both versions of the code about equally (most requests for data would likely not be satisfied by the first-level cache, but requests for the inner-level references would be), bringing their relative times closer together. Swapping all the loops will impair the performance of the linear-array version a bit more, but will likely cause the nested jagged array performance to be horrible (your outer array would probably stick around in the first-level cache, but the nested references probably would not, with the consequence that many element accesses would incur two or three full cache misses).

There is a performance penalty in .NET for arrays which occupy more than 85,000 bytes, especially if they are short-lived, so in many cases a two-level jagged array may be optimal. If data items are 64 bytes each, for example, two levels of nesting on a 64-bit system would allow one to have 10,000 arrays of 1,024 items each without any item growing beyond 85K. If you need more than 10,000,000 items, access patterns would determine whether you'd be better off using larger arrays or a third level of nesting, but there's a wide range of array sizes where the above approach is the best.

Why are 1-dimensional arrays faster than Jagged arrays in C#?

3 Answers3