4

Essentially I'm not sure how to store a 3D data structure for the fastest access possible as I'm not sure what is going on under the hood for multi-dimensional arrays.

NOTE: The arrays will be a constant and known size each and every time, and each element will be exactly 16 bits.

Option one is to have a multi-dimension array data[16, 16, 16] and simply access via data[x, y, z] option two is to have a single dimension array data[16 * 16 * 16] and access via data[x + (y * 16) + (z * 16 * 16)].

As each element should only be 16 bits long, and I have a suspicion that a multi-dimension array would store a lot of references to other arrays internally at a minimum of 32 bits per one, that is a lot of wasted memory. However, I fear it may be faster than running the equation specified in option two each time, and speed is key to this project.

So, can anyone enlighten me as to how much difference in speed there would likely to be compared to how much difference in memory consumption?

Benjamin James Drury
  • 2,353
  • 1
  • 14
  • 27
  • 2
    Possible duplicate of [Multi-dimensional array vs. One-dimensional](http://stackoverflow.com/questions/5476000/multi-dimensional-array-vs-one-dimensional) – Mostafiz Apr 25 '16 at 11:57
  • I think this might help you [Why are multi-dimensional arrays in .NET slower than normal arrays?](http://stackoverflow.com/questions/468832/why-are-multi-dimensional-arrays-in-net-slower-than-normal-arrays) – Craveiro Apr 25 '16 at 11:59
  • 3
    Single dimensional is likely to be faster: https://github.com/dotnet/coreclr/issues/4059#issuecomment-208491798, [but if you want to know which horse is faster you should race them](https://ericlippert.com/2012/12/17/performance-rant/). – Matthew Watson Apr 25 '16 at 12:03
  • benjamin-james-drury, I would change the question slightly to emphasise the fact that every element is 16 bits, as it makes the question different from similar questions, and has produced an interesting take in the answers – James R Apr 25 '16 at 12:25
  • My reasoning for not referring to existing questions was because of the specific case that I had, but I have learned that my thoughts were incorrect, so perhaps I should have defaulted to them. Never the less, thank you to everyone who gave input, I appreciate it and will indeed race the horses to see which is faster. – Benjamin James Drury Apr 25 '16 at 16:24
  • The section on performance here is a very useful introduction: http://www.slideshare.net/wvdang/five-things-every-win32-developer-should-know – Ben Apr 26 '16 at 09:07

3 Answers3

3

I think it's worth pointing out that if your array dimensions are really all 16, then you can calculate the index for the array from (x, y, z) much more efficiently:

int index = x | y << 4 | z << 8;

And the inverse:

int x = index & 0xf;
int y = (index >> 4) & 0xf;
int z = (index >> 8) & 0xf;

If this is the case, then I recommend using the single-dimensional array since it will almost certainly be faster.

Note that it's entirely possible that the JIT compiler would perform this optimisation anyway (assuming that the multiplication is hard-coded as per your OP), but it's worth doing explicitly.

The reason that I say the single-dimensional array would be faster is because the latest compiler is lacking some of the optimisations for multi-dimensional array access, as discussed in this thread.

That said, you should perform careful timings to see what really is the fastest.

As Eric Lippert says: "If you want to know which horse is faster, race your horses".

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • 1
    And the reverse `x = index & 0xF`, `y = index & 0xF0`, and `z = index & 0xF00` – juharr Apr 25 '16 at 12:17
  • @downvoter: Care to let us know what's wrong with the answer? It would be quite useful. – Matthew Watson Apr 25 '16 at 12:19
  • 1
    Single-dimensional array is faster only when you're accessing it sequentially, though, isn't it? If you're looking up based on known coördinates, the two should perform the same. – Luaan Apr 25 '16 at 12:35
  • @Luaan Did you read the link I posted about some compiler optimisations missing? – Matthew Watson Apr 25 '16 at 12:49
  • 1
    Yeah, exactly. The missing optimization mentioned is hoisting part of the calculation outside of the inner loop, which means a lot of unnecessary multiplications. So if you're trying to access one specific place in the array, both should perform the same. The only case where they don't is when you're looping over "columns" in one "row" - the row offset is calculated over and over and over... At least that's how I understand the analysis. – Luaan Apr 25 '16 at 15:19
  • Whilst I haven't chosen this answer as I feel the other individual went into more detail, your method for finding the correct index and the reverse will be invaluable to me, so thank you. – Benjamin James Drury Apr 25 '16 at 16:18
3

C# stores multidimensional arrays as a single block of memory, so they compile to almost the same thing. (One difference is that there are three sets of bounds to check).

I.e. arr[x,y,z] is just about equivalent to arr[x + y*ny +z*nz*ny] and will generally have similar performance characteristics.

The exact performance however will be dominated by the pattern of memory access, and how this affects cache coherence (at least for large amounts of data). You may find that nested loops over x, then y then z may be faster or slower than doing the loops in a different order, if one does a better job of keeping currently used data in the processor cache.

This is highly dependent on the exact algorithm, so it isn't possible to give an answer which is correct for all algorithms.

The other cause of any speed reduction versus C or C++ is the bounds-checking, which will still be needed in the one-dimensional array case. However these will often, but not always, be removed automatically.

Again, the exact algorithm will affect whether the optimiser is able to remove the bounds checks.

Your course of action should be as follows:

  • Write a naïve version of the algorithm with arr[x,y,z].
  • If it's fast enough you can stop.
  • Otherwise profile the algorithm to check it is actually array accesses which are the issue, analyse the memory access patterns and so on.
Ben
  • 34,935
  • 6
  • 74
  • 113
1

I would vote for single-demension array, it should work much faster. Basically you can write some tests, performing your most common tasks and measuring the time spent. Also if you have 2^n array sizes it is much faster to access element position by using left shift operation instead of multiplication.

Nickolay Olshevsky
  • 13,706
  • 1
  • 34
  • 48