Fastest Array addressing

Question

I am running an image analysis code on an array storing information about the image. Unfortunately the code is very heavy and takes an average of 25s to run through a single frame. The main problem I see is the array addressing. Which is the fastest to run through a 2d array and are there at all any differences in

horizontal then vertical

for (int y = 0; y < array.Length; ++y)
    for (int x = 0; x < array[].Length; ++x)
        //Code using array[y][x]

and vertical then horrizontal?

for (int x = 0; x < array[].Length; ++x)
    for (int y = 0; y < array.Length; ++y)
        //Code using array[y][x]

Furthermore, I tried to avoid direct addressing and use pointers instead.

for (int y = 0; y < array.Length; ++y)
    int* ptrArray = (int*)array[0];
    for (int x = 0; x < array[].Length; ++x, ++ptrArray)
        //Code using ptrArray for array[y][x]

or

for (int x = 0; x < array[].Length; ++x)
    int* ptrArray = (int*)array[0];
    for (int y = 0; y < array.Length; ++y, ptrArray += array[].Length)
        //Code using ptrArray for array[y][x]

Any help is greatly appreciated. Max

I should have mentioned that the array is actually a BitmapData for bitmap color assignment :/ sry... — Max Z., Dec 13 '11 at 10:31
Have you tried coding up each solution and measuring how long it takes? That would give you the most accurate answer. But if I had to guess, I'd say that options 3 and 4 are probably slightly faster than options 1 and 2. — aroth, Dec 13 '11 at 10:34
If you take 25s for a single image, the code pieces you posted are clearly not the limiting parts. — CodesInChaos, Dec 13 '11 at 10:36
Your biggest problem is using a multi-dimentional jagged array. Could you turn this into a single-dimentional zero based array instead? — Cameron MacFarland, Dec 13 '11 at 10:37
I think your image processing speed depends on HOW you processing it. So what are you doing in loops? — Renatas M., Dec 13 '11 at 10:38
your current code is not robust, btw - you haven't fixed the array before taking a pointer — Marc Gravell, Dec 13 '11 at 10:42
I think the problem here is not the loops, but rather: the `//Code using {blah}`. If you **don't do anything** except the loops, how long does it take? We can't advise on `{blah}` without seeing `{blah}` — Marc Gravell, Dec 13 '11 at 10:43
The code in the loops can not be optimized much, because it is as lightweight as it will be (AForge). The problem was counting up rather than down. It reduced my full computational speed to <4s :) — Max Z., Dec 13 '11 at 11:44

score 3 · Accepted Answer · answered Dec 13 '11 at 10:33

3

One option is to use reverse looping (start your for() loop from array.Length down to 0)

That'll speed things up abit.

for example,

for (int x = array[].Length-1; x >= 0; --x)
    int* ptrArray = (int*)array[0];
    for (int y = array.Length-1; y >= 0 ; --y, ptrArray += array[].Length)
        //Code using ptrArray for array[y][x]

answered Dec 13 '11 at 10:33

Shai

25,159
9
44
67

1

How would this speed things up? The compiler should be smart enough to access the property only once since the array length won't change in the interim. – Merlyn Morgan-Graham Dec 13 '11 at 10:34
[link](http://www.mkyong.com/java/reverse-loop-versus-forward-loop-in-performance-java/) read this, for example – Shai Dec 13 '11 at 10:36
That link is for Java, not .NET, so it isn't guaranteed to apply. Also, [it seems there is some disagreement](http://stackoverflow.com/questions/1029992/is-the-inequality-operator-faster-than-the-equality-operator). I guess it's worth a shot though as it will take only a minute to try it out. – Merlyn Morgan-Graham Dec 13 '11 at 10:40
4

the link is for java; the JIT (not part of C#, but rather: the VM) can usually spot a regular vector `for` loop and remove the bounds check etc... I think this would need more clear profiling to support it, personally. – Marc Gravell Dec 13 '11 at 10:41
Maybe Max z. can check that for us. – Shai Dec 13 '11 at 10:43
I wouldn't even expect it to necessarily be true or untrue for every Java VM, never mind .NET. – Jon Hanna Dec 13 '11 at 11:09
I just rewrote my whole code to countdown, compare to zero and use pointers as often as possible. I reduced the speed from 25s to <4s. Thanks a lot guys :) – Max Z. Dec 13 '11 at 11:36
Excellent. Did you check which order you were hitting the boundaries and move from jagged to 2-dimenstional? – Jon Hanna Dec 13 '11 at 11:45
@JonHanna - This may be a bit late but maybe you are interested in this. I'm using BitmapData to generate the image which is a 2d array, which in fact is a long 1d array. To find the pointer I'm using `*.Scan0.ToPointer()` and to find the boundries I use `*.Stride` (where * is the BitmapData instance). This allows me to overjump possible unused spaces at the edge of the image and redefine the pointer to point at the beginning of each row of the image. – Max Z. Dec 13 '11 at 12:03
notice that comparing against zero works faster, as it is simpler in assembly to check for 0 as compared to some numeric value. – Peter Mar 28 '17 at 14:52

Jon Hanna · Answer 2 · 2011-12-13T11:14:35.737

The most important rule is that it's all theory until you profile. I don't hold with those who insist that profiling is everything (without some theory you're no better than a Cargo Cultist putting coconuts on their ears and waiting for the plane to come) but your theory can always be wrong or incomplete, so profiling is crucial.

Generally, we want the inner-scan to be horizontally (in terms of the array, rather than the image, though for most formats that's the same). The reason being that with an array like:

00 01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29

It is going to be laid out as:

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

You want to be scanning along contiguous blocks that can be loaded into CPU caches and then used entirely, rather than scanning from block to block and needing to regularly change the CPU cache contents.

This is even more important if you try to parallelise the algorithm. You want each thread dealing with its own contiguous blocks of memory as far as both input and output goes, rather than not only suffering the way single-threaded code does with poor cache-hit-frequence but also causing each other's buffers to be dirtied and need refreshing. This can be the difference between parallelising leading to a speed boost and parallelising actually slowing things down.

Another thing is the difference between a 2-dimensional array byte[,] rather than an array of arrays byte[][], which your comment in your question "array[y][x]" makes me wonder if perhaps you are using. With the former to obtain arr[1,2] the logic is:

Check Bounds
Calculate position (simple fast arithmetic)
Retrieve value.

With the latter, the logic is:

Check bounds
Obtain array through pointer.
Check bounds
Retrieve value.

There is also less good memory cache-hit-frequence. The latter has benefits when "jagged" structures are needed, but that is not the case here. 2D is almost always faster than array of arrays.

Things I don't see as likely to help, but I would certainly try them in your situation:

You may find a boost from doing your 1d <=> 2d logic. Have a single-dimension array where idx = y * width + x. It shouldn't make an appreciable difference, but it's worth trying.

Optimisations do try to both hoist calls to .Length and omit needless bounds checking, so you may find manually hoisting and switching to pointer arithmetic doesn't gain anything, but in a case where you really do need to bring time down it is certainly worth profiling.

Finally. Have you profiled how fast your code is at scanning the array and doing nothing? It could be that another part of the code is the real bottleneck, and you're fixing the wrong thing.

Unless things have changed in the most recent .NET CLR, rectangular arrays in .NET have been notoriously slow and often the speedup comes in the reverse direction (going from `x[,]` to `x[][]`) rather than the direction suggested here. — Special Sauce, Nov 23 '15 at 19:17
One of the .NET implementation problems is that rectangular arrays can have non-zero bases which complicates many of the core operations. More detailed info here: http://blog.mischel.com/2013/05/08/are-jagged-arrays-faster-than-rectangular-arrays/ — Special Sauce, Nov 23 '15 at 19:28

score 1 · Answer 3 · answered Dec 13 '11 at 10:33

I have no idea, but you've already come up with the examples. So you could run your code samples in a loop and profile it yourself.

var sw = new Stopwatch();
sw.Start();
ExecuteMyCode();
sw.Stop();
Console.WriteLine("Time: " + sw.Elapsed);

You might be able to speed up your processing by using a multi-threading construct like Parallel.ForEach. This would work well if the code in your loop avoids dependencies between loop iterations.

score 0 · Answer 4 · answered Dec 13 '11 at 10:35

0

Can you goy unsafe? Pointer. THe problem with array is that you STILL have the boundary checks on every access. Pointers remove that. Note tha this is fully C# supported - but you need to put it into an unsafe block. It also means you must be ABLE to run unsafe code, which is not always a given.

http://msdn.microsoft.com/en-us/library/28k1s2k6.aspx

has a code sample.

answered Dec 13 '11 at 10:35

TomTom

61,059
10
88
148

1

The examples with `int*` (in the question) already do this. Note also that the JIT is usually able to remove bounds checks on vector/`for` loops. – Marc Gravell Dec 13 '11 at 10:44

score 0 · Answer 5 · answered Dec 13 '11 at 10:47

0

If it is possible, try to reallocate your array so that first dimension is less than the second one. It would speed up things drastically. Another solution is to reallocate the data in a single-dimension array as was proposed above.

answered Dec 13 '11 at 10:47

TechGeorgii

414
4
14

score 0 · Answer 6 · 2011-12-13T11:05:53.687

Always make sure that your innermost loop accesses contiguous memory.

This is usually the row of your image. Note, that in rectangular arrays, you should make this the last index: array[y,x].

this paper suggests that the built-in C# rectangular arrays (with the multiple indexes) are quite slow. I read this before, but this is the only reference I got. I would start with a linear array, and calculate an offset once for each row. Unmanaged will only help you in really trivial cases.

If a single frame takes 25 seconds, then it is either huuuuge, or you do very complex processing. In that case it is only interesting to spend effort on optimizing memory access if you access many input pixels for each output pixel.

Both... It's using FFT and filters for depth analysis – Max Z. Dec 13 '11 at 11:34 — Max Z., Dec 13 '11 at 11:34

Fastest Array addressing

6 Answers6