4

What is the right way to declare a multi-dimensional array that does not change size during runtime on the heap? (ideally in C++11, if there is some feature only available in C++14 (not C++17) I would love to hear about it too, but chances are it would not work for me)

I have by now looked through dozens of questions and answers about this topic but none seems to really answer it/some answers conflict with others.

The following solutions I have found and the problems they seem to have that make them non-viable (most of these taken from SO answers and their comments, examples all given with assuming a 3D-array as the target):

  • Normal [][][] array declared with new/declaring an array of pointers
    Problem: Non-contiguous in memory, every individual array has its independent location in memory

  • multiple std::arrays/boost::arrays nested inside each other
    Problem: Non-contiguous in memory, every individual array has its independent location in memory

  • Matrix
    Problem: Just a container for std::array, same problems apply basically

  • multiple std:vectors nested inside each other
    Problem: Dynamic, pretty much all the other problems mentioned before

  • Declare as a single block with a pointer to a normal [] array, then go through the index by calculation during runtime with a function like GetIndex(array,x,y,z)
    Problem: This seems to tick all points, but this solution seems less than ideal because of the significant CPU-overhead this seems to introduce when you need to access/change the elements often

Little bit unrelated to that, I have also had some issues with these solutions if they were in classes and I had to access their values from the outside with the . operator, so I would be even more grateful if someone could tell the correct solution with an example of both correct declaration and correct access of the heap-allocated multidimensional-array as a class-member.

trincot
  • 317,000
  • 35
  • 244
  • 286
uncanny
  • 65
  • 1
  • 10
  • Why does it have to be contiguous? –  Nov 28 '19 at 05:29
  • Generally the purpose of an array, and the reason it is so fast, is to have all the values in one block rather than spread around everywhere, causing all kinds of trouble like cache-misses and so on. – uncanny Nov 28 '19 at 05:33
  • 1
    Declare a class with template parameters which takes dimensions. You can use C arrays in the implementation. – abhiarora Nov 28 '19 at 05:35
  • 1
    Are the sizes known at compile time, or just unchanging during runtime? Do you need (also) to be able to do 1D pointer arithmetic throughout the array? – Davis Herring Nov 28 '19 at 05:38
  • 1
    `but this solution seems less than ideal because of the significant CPU-overhead` Have you measured this overhead? Calculating index from components is a simple arithmetical operation. Perhaps its performance will be suitable and you don't need to increase your program complexity over such mundane task. – Ari0nhh Nov 28 '19 at 05:45
  • @DavisHerring I am assuming you cannot assign the size of a normal array with a variable that is not known during compile-time? Yes, the size is always declared with a #define. – uncanny Nov 28 '19 at 05:48
  • You don't have to know the size of the array at compile time with `new`. –  Nov 28 '19 at 05:49
  • @Ari0nhh The question is about how to do it "the right way", you could also interpret it as the theoretically best way to do it that I am asking for. Also, since these arrays will in my case make the core of the program in a game-dev situation (and I need to guarantee that some functions revolving around them will finish inside one frame in some cases), the less overhead there is, the more features it will allow to have implemented and the more arrays can be existing at one time. – uncanny Nov 28 '19 at 05:51
  • If you're worried about speed and cache misses, then you might want to read [this answer](https://stackoverflow.com/a/38508513/10957435). –  Nov 28 '19 at 05:54
  • @uncanny: You get one runtime array dimension with `new`: the first one. And the result is always contiguous. But you didn’t answer about 1D indexing, which is critical. – Davis Herring Nov 28 '19 at 05:59
  • @DavisHerring I am sorry, no I do not think it is "necessary" per-se. If I am correct the last solution I listed in my question necessitates it though if I am correct, right? Ideally it would just handle like a normal multi-D array with being accessed through [x][y][z], but nothing like that is a requirement. – uncanny Nov 28 '19 at 06:03
  • Also, [`std::array`s actually are contiguous](https://stackoverflow.com/q/6632915/10957435) (since C++11, I believe), though I'm not sure how this affects multi-dimensional arrays. –  Nov 28 '19 at 06:12
  • Wouldn’t nested `std::array`s be contiguous in memory? – NicholasM Nov 28 '19 at 06:15
  • 2
    `static` and 'on the heap' are mutually contradictory. Which do you mean? – user207421 Nov 28 '19 at 06:19
  • 1
    Do you need a 2D or 3D array?. You can simply declare a 2D array that will be contiguous in memory by simply declaring and allocating for a *pointer-to-array of int [width]*. E.g., a 5x10 continuous would be `int (*arr)[10] = new int [5][10];` That provides direct 2D array addressing, e.g. `arr[i][j]` within the contiguous block.. – David C. Rankin Nov 28 '19 at 07:18
  • @DavidRankin-ReinstateMonica The examples are all in 3D, ideally it would be viable for X dimensions. – uncanny Nov 28 '19 at 07:23
  • 1
    The same should hold for any dimension. For a 3D array, you would declare/allocate for a *pointer to array of int [rows][width]*. E.g. `int (*arr3)[5][10] = new int [5][5][10];` for a 5x5x10 array and so on. – David C. Rankin Nov 28 '19 at 07:26
  • @DavidRankin-ReinstateMonica Are you sure that is contiguous? That looks to me like an array storing pointers for the first dimension, which means it is not, but I could be wrong there. – uncanny Nov 28 '19 at 07:52
  • 1
    Yes, Essentially you are allocating storage for 5 5x10 arrays as a single block of memory above. You can dump the pointers for each element to confirm. It is a *single-pointer* to a 5x10 array and you allocate storage for as many of them as you need using the 3rd dimension. – David C. Rankin Nov 28 '19 at 08:05
  • @DavidRankin-ReinstateMonica Well thanks a lot that was what saved me. I have not looked at the memory yet/did not dump anything, but from what things look performance-wise right now it works like an absolute charm. Works perfectly like a normal multi-D array too even after a function-pass, so I would be really interested if there is anyone who can come up with a solution that is better than this. – uncanny Nov 28 '19 at 11:40

4 Answers4

1

Normal [][][] array declared with new/declaring an array of pointers Problem: Non-contiguous in memory, every individual array has its independent location in memory

...

Yes, the size is always declared with a #define. – uncanny

Yeah, C++ multidimensional arrays are tricky and quite good at making C++ code unreadable. Actually, you can create static multidimensional array if sizes are known at compile time, so you will get it allocated as a contiguous chunk of memory.

int main()
{
    int arr[100][200][100]; // allocate on the stack

    return 0;
}

The question is how to allocate it on the heap... No problem, just wrap it into a struct, and allocate this struct on the heap.

#include <memory>

struct Foo
{
    int arr[100][200][100];
};

int main()
{
    auto foo = std::make_unique<Foo>(); // allocate on the heap
    auto& arr = foo->arr;

    arr[1][2][3] = 42;

    return 0;
}

The std::make_unique call allocates Foo on the heap, and guarantees that memory will be deallocated. Also, you are able to access the array inside and outside of Foo with almost zero amount of boilerplate code. Nice!

Stas
  • 11,571
  • 9
  • 40
  • 58
1

The right way is to write/use a multidimensional array class. Multidimensional arrays are fundamental objects throughout computer science and it's (IMO) insane that the STL never included first-class support for multidimensional arrays. Internally, the class should allocate a 1d array on the heap (for runtime-sized arrays) and do the arithmetic to convert multidimensional indices into 1d indices.

Eigen is a good choice if you're doing numerical work; not sure how useful it is for multidimensional arrays of a non-numeric type.

japreiss
  • 11,111
  • 2
  • 40
  • 77
  • Thank you for your answer. Can you say why this solution is better than the solution David Rankin provided in the comments? – uncanny Nov 28 '19 at 11:50
  • David's solution requires that all but the first dimension of the array are known at compile time, because they are a part of the type declaration of `arr`. – japreiss Nov 28 '19 at 11:57
0

If all but the first dimension are compile-time constants (whether or not the first is), just write new T[x][Y][Z] or (more safely) std::make_unique<T[][Y][Z]>(x). The result is contiguous and the compiler has every opportunity to apply tricks like shifts instead of multiplications if appropriate to the dimensions. What you can’t do with such an entity is pass it as pointer and size to a function expecting a one-dimensional array:

f(&a3[0][0][0],x*Y*Z);  // undefined behavior

because pointer arithmetic is defined only within one T[] array (here, the array a3[0][0] that is a T[Z]).

If the first dimension is also constant, you can use a nested std::array (which has no extra memory overhead in practice) or just

struct A3 {
  T a[X][Y][Z];
};

Either has the advantage that it can be passed and returned by value and used as a standard container element. Such an object can of course also be passed by reference, or you can use an “array parameter”:

T f(A3 &a3) {
  return a3.a[0][0][1]+a3.a[0][1][0]+a3.a[1][0][0];
}
T g(T a[][Y][Z]) {
  return a[0][0][1]+a[0][1][0]+a[1][0][0];
}

Note that g’s parameter type is actually T (*)[Y][Z], which is why the first bound may be omitted. If you have

A3 *a3;

you would call these as

f(*a3);
g(a3->a);

but that’s just standard pointer usage and has nothing to do with the array type or the heap allocation.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • So, if I go with the struct option, if I declare MyClass.A3 array*; , pass that as pointer with MyClassInstance.array to function(A3 *array); , inside I can just do *array.a[X][Y][Z] = value; on it is what you are meaning? If not, could you please provide an example with correct usage? – uncanny Nov 28 '19 at 06:31
  • @uncanny: The syntax in your comment is broken in several respects; I added an example, but don’t expect it to be a substitute for a book on the subject. – Davis Herring Nov 28 '19 at 06:48
  • Well, this is not exactly what I asked for because I was aware of all of that, I was just quickly writing something up to point to the last paragraph in my original question but thank you for the effort of editing your answer. – uncanny Nov 28 '19 at 06:53
-1

Memory is nearly always one-dimensional. What we see as multi-dimensional arrays in languages are an illusion created by the compiler.

Let's take an example.

int numbers_1[10];
int numbers_2[2][5];

These both allocate enough storage for 10 integers and leaving type-safety - as might be in C/C++ - aside, are equivalent from the point of view of a processor. Indeed, you can transform - treat either as having however many dimensions as you like - one into the other through type casting pointers.

numbers_1[0] == ((int**)numbers_1)[0][0];

This expression holds true, so forth for all the 10 elements.

It does not matter what the storage class of an array is, they are all really just one dimensional.

Also, I believe it will help if you read about how Turing machines work. The tape is one dimensional even in UTM and that is the most powerful theoretical machine we have.

Tanveer Badar
  • 5,438
  • 2
  • 27
  • 32
  • Yes, but would that be the most advisable solution? It would be the most ideas to pass as a normal array declared by new by pointer and then cast it to a multi-D array inside the function is what you are trying to say? – uncanny Nov 28 '19 at 06:58
  • That's up to whatever use case you have in mind. I am just explaining how multi-dimensional arrays are an illusion. You can certainly allocate a big enough chunk with `new` and then treat it as having k dimensions. But that throws away all the safety the compiler and library may offer, so if you manage to shoot yourself in the foot or face you would have no one else to blame except yourself. – Tanveer Badar Nov 28 '19 at 06:59
  • Also see the last paragraph. – Tanveer Badar Nov 28 '19 at 07:03
  • 1
    Well, I assume then it is more meant as a comment because it does not directly answer the question and I was indeed aware of that and the question is simply about the best way to manage that. Arrays actually guarantee that the memory is one single block even when going multi-D, compared to std::vector etc. Your solution is probably better than my fitfth point in my question, or more of an improval of it, but as you explain it correctly, it is probably still not "perfect" in the traditional sense as you just throw away pretty much every single tool offered to you by the language. – uncanny Nov 28 '19 at 07:06
  • I meant last paragraph of my answer, I just added it. :) What you ask will eventually be simulated by the implementation which **is** 1-d, even in the theoretical models computers are based upon. – Tanveer Badar Nov 28 '19 at 07:08
  • That cast doesn’t make any sense at all (strict aliasing aside): what `int*` is *stored* at the beginning of the array? – Davis Herring Nov 28 '19 at 07:12
  • Yes I read it and I again thank you very much for the effort. Nothing really new here but I am sure once this question turns up on Google it will be helpful for a lot of people as further input to increase their understanding. Though I would personally disagree to it honestly because in many modern workloads actually a tree is faster to handle large datasets than a tape. Not in this case, certainly, but it is too much of a simplification in general. But that is what I alluded to before, I am not sure if it is really a fitting thing to explain in the context of this question. – uncanny Nov 28 '19 at 07:12
  • @DavisHerring Why does it not make sense? I am simply treating that as a 2-d array decayed to pointer. – Tanveer Badar Nov 28 '19 at 07:21
  • 3
    `((int**)numbers_1)[0]` is wrong (and UB). You probably meant `(reinterpret_cast(numbers_1)[0][0]` (but it is UB too). – Jarod42 Nov 28 '19 at 09:00
  • @Jarod42 Do you know _why_ it is UB? I couldn't find anything on en.cppreference.com for either `reinterpret_cast` or `static_cast` or `(T)expr` style cast which lists this as allowed conversion. My question is what'll go wrong if this was not UB? – Tanveer Badar Nov 28 '19 at 10:15
  • 2
    @TanveerBadar: The casts themselves are fine; it’s the usual [TBAA](https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing) rules that prohibit using the result (except to cast it back). Your version (with `int**`) would, if we ignore UB, look at the beginning of the array, interpret the first 4/8 bytes as an `int*`, and then (try to) read an `int` from **that** utterly fictitious location. Jarod42’s version would work fine in a world without this UB; it reshapes the array rather than supposing a pointer to exist in memory. – Davis Herring Nov 28 '19 at 15:33
  • @DavisHerring Thanks for the explanation! I certainly have a lot of thinking to do now, a few mental models to adjust. – Tanveer Badar Nov 29 '19 at 07:26