allocating dynamic 2D/3D arrays in c++ with contiguous memory

Question

I need to dynamically allocate 2D/3D arrays in c++. I would like these to have contiguous memory, for purposes of sending/receiving with MPI. Is there any advice on how to dynamically allocate 2D/3D matrices, with contiguous memory, and be able to access/mutate the elements of these matrices efficiently? I currently use a c++ template to manage matrices. I am showing a small excerpt below for the 3D case. To my understanding, this does not allocate memory contiguously.

template <class Type>
class Matrix
{

  private:

    // size of matrix
    int Nx_ = 0;
    int Ny_ = 0;
    int Nz_ = 0;

    // matrix
    Type*** A3_;

  public:

    // constructor
    Matrix(int Nx, int Ny, int Nz)
    {

      Nx_ = Nx; 
      Ny_ = Ny;
      Nz_ = Nz;

      A3_ = new Type**[Nx];

      for (int i = 0; i < Nx; i++)
      {

        A3_[i] = new Type*[Ny];
        for (int j = 0; j < Ny; j++)
        {
          A3_[i][j] = new Type[Nz];
        }

      }

    }

    // insert value 
    void put(int i, int j, int k, Type a) 
    {
      A3_[i][j][k] = a;
    }

    // get value 
    Type get(int i, int j, int k) 
    {
      return A3_[i][j][k];
    }

};

Many posts recommend using a 1D vector, and then accessing the elements using index math. Perhaps like:

// get value 
Type get(int i, int j, int k) 
{
   int NxNy=Nx_*Ny_;
   return Vec[i + j * Ny_ + k * NxNy];
}

My concern is that this requires 4 operations (2 additions and 2 multiplications) per get/put. That seems inefficient when working on very large matrices (~1 billion elements) using triple nested for-loops. However, given my limited experience with c and c++, it has also occurred to me that perhaps my original template class above also requires multiple operations per get/put, and they are just hidden under the "hood" of how c++ manages memory.

In summary: is there any advice on how to dynamically allocate 2D/3D matrices, with contiguous memory, and access their elements efficiently?

Your suggested solution is correct. You're also guessing right about the address calculation, the contigous approach is likely going to be faster as it is only dereferencing one pointer as opposed to three dependent memory loads in the nested case. To be really sure you would have to measure, but I'd be surprised if there is much speed difference between the two solutions when iterating the array. — chrysante, Feb 28 '23 at 17:23
[See this answer](https://stackoverflow.com/questions/21943621/how-to-create-a-contiguous-2d-array-in-c/21944048#21944048) and [this one](https://stackoverflow.com/questions/52068410/allocating-a-large-memory-block-in-c/52069368#52069368). — PaulMcKenzie, Feb 28 '23 at 17:33
Note that the answers at the links I posted allows you to access the pool of memory. Thus not only do you get the option of using the square brackets to access an element, but you also can add code to get to the pool and access the elements directly (not shown in the code, but it is very easy to add those routines). — PaulMcKenzie, Feb 28 '23 at 17:47
I'll echo @chrysante with the added notes 1) I believe the calculation for a "Row Major" approach would be i * NyNz + j * Nz + k and that the "Column Major" version would be i + j * Nx + k * NxNy 2) it should be possible to set this up in a way with an auxiliary class and custom operator[]'s that will permit you to use Vec[a][b][c] syntax without addition allocations for "index" arrays, if desired. — Avi Berger, Feb 28 '23 at 17:54
Thanks Chrysante, Paul, and Avi. These are all very helpful comments. I will explore the different options... — ntilton, Feb 28 '23 at 18:17

463035818_is_not_an_ai · Answer 1 · 2023-02-28T18:22:18.077

The advice you found is the right direction: Use a flat 1D data structure plus an index transformation.

Only your realization is suboptimal. I am going to only outline the general idea, no production ready code. Consider a loop like this:

for (int k=0; k < K; ++k) {
    for (int j=0; j < J; ++j) {
        for (int i=0; i < I; ++i) {
             auto x = get(i,j,k);
             // ...
        }
    }
}

Then you want to arrange your elements such that for fixed K and J all get(i,J,K) are close to each other in memory. Thats what you already have. Now you just need a way to access them quick, and that can be achieved via an iterator. An iterator can be simply a pointer. You can do something along the line of:

for (int k=0; k < K; ++k) {
    for (int j=0; j < J; ++j) {
        for (Type* it = get_iterator(i,j); it != get_iterator(i,j+1); ++it) {
             auto x = *it;
             // ...
        }
    }
}

++it is just incrementing a pointer. This is as cheap as it gets to traverse the elements.

Thanks for the help! I like the iterator, and also now realizing I need to pay more attention to how the elements are ordered in memory. — ntilton, Feb 28 '23 at 18:21

score 0 · Answer 2 · answered Feb 28 '23 at 18:23

Allocate one-dimensionally and use index conversion. You can be more elegant than having an get and set method:

class Matrix {
private: double* data; int n;
public:
  double& operator()(int i,int j,int k) {
    return data[ i*n*n + j*n + k ];
}

That way you can both x = mymat(i,j,k) and mymat(i,j,k) = x.

Let me also remark that you can use the Vector and Subarray datatypes for sending your matrix and parts of it.

allocating dynamic 2D/3D arrays in c++ with contiguous memory

2 Answers2