8

I am trying to allocate a large memory block for a 3D matrix in C++ of floating point value. It's dimensions are 44100x2200x2. This should take exactly 44100x2200x2x4 bytes of memory which is about 7.7gb. I am compiling my code using g++ on a 64bit x86 machine with Ubuntu. When I view the process using htop, I see that the memory usage grows to 32gb and is promptly killed. Did I make a mistake in my memory calculation?

This is my code:

#include <iostream>

using namespace std;
int main(int argc, char* argv[]) {
  int N = 22000;
  int M = 44100;
  float*** a = new float**[N];
  for (int m = 0; m<N; m+=1) {
    cout<<((float)m/(float)N)<<endl;
    a[m] = new float*[M - 1];
    for (int n = 0; n<M - 1; n+=1) {
      a[m][n] = new float[2];
    }
  }
}

EDIT: My calculation was incorrect, and I was allocating closer to 38gb. I fixed the code now to allocate 15gb.

#include <iostream>

using namespace std;
int main(int argc, char* argv[]) {
  unsigned long  N = 22000;
  unsigned long  M = 44100;
  unsigned long blk_dim = N*(M-1)*2;
  float* blk = new float[blk_dim];
  unsigned long b = (unsigned long) blk;

  float*** a = new float**[N];
  for (int m = 0; m<N; m+=1) {
    unsigned long offset1 = m*(M - 1)*2*sizeof(float);
    a[m] = new float*[M - 1];
    for (int n = 0; n<M - 1; n+=1) {
      unsigned long offset2 = n*2*sizeof(float);
      a[m][n] = (float*)(offset1 + offset2 + b);
    }
  }
}
lee
  • 740
  • 2
  • 11
  • 24
  • 2
    That means 7.7GB of **contiguous** memory. This is quite a difficult requirement for an allocator, and out of reach of many systems. Could you split it into many smaller blocks? – Alejandro Aug 29 '18 at 01:52
  • 3
    @Alejandro How is it (as in, the impl.) contiguous memory? Isn't each 'new' separate/non-contiguous (and possibly less efficient due to such)? Or was "quite a difficult task" referring to it not-being such? – user2864740 Aug 29 '18 at 01:54
  • I can't really split it up at all. What if I allocated a single dimensional array of size 44100x2200x2 and do some pointer magic to get it in the matrix form that I'm currently using? – lee Aug 29 '18 at 01:55
  • @SpentDeath There are already pointers - `new float[2]`. Could do one allocation for the edge-nodes and then assign these different pointers to the array (or hide the abstraction to eliminate the pointers entirely). I believe the original memory requirements undercount due to not counting the pointers-to the allocated objects. That is, there are N x M float[2]'s created (+ containers), so that's at a minimum 8 for the object and 8 for the (64-bit) pointer - *excluding* all other housekeeping from the allocator. – user2864740 Aug 29 '18 at 01:58
  • 44100x2200x4 bytes are used for the pointers. Unless I did the math wrong, this means 0.3 gb, nowhere close to the 32 gb maximum. – lee Aug 29 '18 at 02:00
  • 2
    @Alejandro *Virtually* contiguous, yes, but not physically contiguous. It's a lot of memory, but on Linux, it's going to go straight to `mmap(2)`. – Jonathon Reinhart Aug 29 '18 at 02:06
  • @SpentDeath Just saying that `MxNx4x2` (claimed at 7.7GB) is, at best, only counting *half* the "required" memory.. try `MxNx16` as each float[2] is 8 bytes of data and a saved 8 byte pointer in the immediate array container (this adjustment still does not count any internal allocator overheads). – user2864740 Aug 29 '18 at 02:08
  • What do you mean by container array? These are all primitive types, I'm not using any STL features. Why would there by any internal allocator overhead? Do you mean from the OS? – lee Aug 29 '18 at 02:13
  • @SpentDeath Container: `a[m] = new float*[M - 1];` There a N such "a[m]" containers created. Each element is 8 bytes, a 64-bit pointer to each of the edge-float[2] objects (which are 8 bytes of data *not including* any internal allocator overheads). So *minimum* expected size is `MxNx8` (containers) + `MxNx8` (float[2]) = `MxNx16`, which is twice as much as the initial hypothesis of `MxNx2x4`. – user2864740 Aug 29 '18 at 02:13
  • 44100x2200x2x4 ~7.7 gb is just for the matrix elements. If we are considering the overhead from pointers, there are N*(M -1) pointers using 4 bytes each. That's 44100x2200x4 ~ 400mb. – lee Aug 29 '18 at 02:17
  • How much does each float[2] take to *allocate*? (8 bytes, 'assuming' just data). How much does each `float*` (pointer to the created float[2]) take? (8 bytes, for 64-bit pointer). Thus "memory of edge matrix elements" == "memory of pointers to said edge matrix elements", as there is exactly one pointer per allocated float[2]. (Then there is the *additional* bookkeeping not reflected..) – user2864740 Aug 29 '18 at 02:19
  • @SpentDeath 4 bytes? – xaxxon Aug 29 '18 at 02:19
  • @user2864740 I was wrong in my overhead calculation. It should be 44100x2200x8 bytes ~ 800mb. But 44100x2200x2x4 + 44100x2200x8 is still less than 32gb. – lee Aug 29 '18 at 02:23
  • 1
    @SpentDeath 1) Saying the *wrong thing* many times does not make it correct [8 *cannot* be correct, for reasons stated above]; 2) N in code is 22,000 - not 2,200; (3. And if any attention was paid to parentheticals..) – user2864740 Aug 29 '18 at 02:27
  • 1
    Can you explain why you say "I am trying to allocate a large memory block" and then post code where you actually allocate millions of small memory blocks? – M.M Aug 29 '18 at 02:48

3 Answers3

13

You forgot one dimension, and the overhead of allocating memory. The shown code allocates memory very inefficiently in the third dimension, resulting in way too much overhead.

float*** a = new float**[N];

This will allocate, roughly 22000 * sizeof(float **), which is rougly 176kb. Negligible.

a[m] = new float*[M - 1];

A single allocation here will be for 44099 * sizeof(float *), but you will grab 22000 of these. 22000 * 44099 * sizeof(float *), or roughly 7.7gb of additional memory. This is where you stopped counting, but your code isn't done yet. It's got a long ways to go.

a[m][n] = new float[2];

This is a single allocation of 8 bytes, but this allocation will be done 22000 * 44099 times. That's another 7.7gb flushed down the drain. You're now over 15 gigs of application-required memory, roughly, that needs to be allocated.

But each allocation does not come free, and new float[2] requires more than 8 bytes. Each individually allocated block must be tracked internally by your C++ library, so that it can be recycled by delete. The most simplistic link-list based implementation of heap allocation requires one forward pointer, one backward pointer, and the count of how many bytes are there in the allocated block. Assuming nothing needs to be padded for alignment purposes, this is at least 24 bytes of overhead per allocation, on a 64-bit platform.

Now, since your third dimension makes 22000 * 44099 allocations, 22000 allocations for the second dimension, and one allocation for the first dimension: if I count on my fingers, this will require (22000 * 44099 + 22000 + 1) * 24, or another 22 gigabytes of memory, just to consume the overhead of the most simple, basic memory allocation scheme.

We're now up to about 38 gigabytes of RAM needed using the most simple, possible, heap allocation tracking, if I did my math right. Your C++ implementation is likely to use a slightly more sophisticated heap allocation logic, with larger overhead.

Get rid of the new float[2]. Compute your matrix's size, and new a single 7.7gb chunk, then calculate where the rest of your pointers should be pointing to. Also, allocate a single chunk of memory for the second dimension of your matrix, and compute the pointers for the first dimension.

Your allocation code should execute exactly three new statements. One for the first dimension pointer, One for the second dimension pointers. And one more for the huge chunk of data that comprises your third dimension.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • 2
    Any allocation scheme that uses that much overhead for 8-byte blocks is garbage. A reasonable allocation scheme will use at most 8 bytes per allocation, and a good one will use only a small block per page of memory dedicated to a particular size block, so only 1-2% overhead for 8 byte allocations. – Chris Dodd Aug 29 '18 at 02:34
  • This was exactly my issue! I updated my code to utilize 15gb, and htop verified that I am indeed just using 15gb. Is there any way to improve my memory usage to ust use 7.7gb? The only hack I can think of is to just use the blk array and index [m][n] by a linear transform to the correct index. I guess I could use a macro to make it nicer, but other than the hacky ways, is there anything else? – lee Aug 29 '18 at 03:26
  • Any reason why you can't just do `class BigArray {public: float _bigArray[44100][22000][2];}; BigArray * myBigArray = new BigArray;` ? That will use 'only' 7.7GiB. – Jeremy Friesner Aug 29 '18 at 03:37
  • 44100 and 22000 are both cli parameters. They are hard coded right now as an example. – lee Aug 29 '18 at 03:52
  • 4
    In that case, you could go with `float * bigArray = new float[M*N*O];` and write getter and setter methods that compute the correct offset into the one-dimensional array. The main thing is that since the array is regular, you only need to do a single call to `new`. – Jeremy Friesner Aug 29 '18 at 04:00
  • @ChrisDodd Have some good references wrt. allocation techniques used in a 'common' implementation? Mostly I find answers with "implementation specific" .. – user2864740 Aug 29 '18 at 04:12
  • @SpentDeath Seriously consider whether you need *contiguous* memory, because your posted sample doesn't deliver that at all. Yes, it would give you the best cache continuity, but at that size you're blowing through your cache so fast during enumeration I doubt it will matter much. A more-sparse approach may be in order here. If some API your invoking requires 7.7gB contiguous memory, a sparse approach obviously won't work, however. Some creativity with an outer-most unordered map, an inner vector, and a final array, may suit your needs, perform well enough, and be kinder memory-wise. – WhozCraig Aug 29 '18 at 04:52
  • @WhozCraig What do you mean by sparse? – lee Aug 29 '18 at 05:45
  • 2
    @SpentDeath I mean you're allocating space for an enormous number of cells. If the majority of those aren't actually needed (they would otherwise carry some default value anyway) *and* you don't need to feed part or all of this monster to some API that requires contiguous memory (which your posted code in-question didn't have anyway), you can utilize other options. Mapping int to maps of int to 2-cell arrays for example. The performance would not be as good (obviously, we're talking hashing via unordered maps in the best-case), but the memory utilization could be *considerably* better. – WhozCraig Aug 29 '18 at 07:34
  • @WhozCraig All entries in the matrix are non-zero. This is a precomputed table that gets used in many operations. It needs to be kept in-memory and all parts need to be accessible. You can't keep subsections of it in memory, since there can be computations that iterate through all matrix values, and can require access to vectors in arbitrary position in the matrix. – lee Aug 29 '18 at 19:43
  • @SpentDeath So full enumeration is required, but block-continuity beyond vectors is *not*? That makes a difference. – WhozCraig Aug 29 '18 at 20:02
4

Just to round out one answer already given, the example below is basically an extension of the answer given here on how to create a contiguous 2D array, and illustrates the usage of only 3 calls to new[].

The advantage is that you keep the [][][] syntax you would normally use with triple pointers (although I highly advise against writing code using "3 stars" like this, but we have what we have). The disadvantage is that more memory is allocated for the pointers with the addition to the single memory pool for the data.

#include <iostream>
#include <exception>

template <typename T>
T*** create3DArray(unsigned pages, unsigned nrows, unsigned ncols, const T& val = T())
{
    T*** ptr = nullptr;  // allocate pointers to pages
    T** ptrMem = nullptr;
    T* pool = nullptr;
    try 
    {
        ptr = new T**[pages];  // allocate pointers to pages
        ptrMem = new T*[pages * nrows]; // allocate pointers to pool
        pool = new T[nrows*ncols*pages]{ val };  // allocate pool

        // Assign page pointers to point to the pages memory,
        // and pool pointers to point to each row the data pool
        for (unsigned i = 0; i < pages; ++i, ptrMem += nrows)
        {
            ptr[i] = ptrMem;
            for (unsigned j = 0; j < nrows; ++j, pool += ncols)
                ptr[i][j] = pool;
        }
        return ptr;
     }
     catch(std::bad_alloc& ex)
     {
         // rollback the previous allocations
        delete [] ptrMem;
        delete [] ptr;
        throw ex; 
    }
}

template <typename T>
void delete3DArray(T*** arr)
{
    delete[] arr[0][0]; // remove pool
    delete[] arr[0];  // remove the pointers
    delete[] arr;     // remove the pages
}

int main()
{
    double ***dPtr = nullptr;
    try 
    {
        dPtr = create3DArray<double>(4100, 5000, 2);
    }
    catch(std::bad_alloc& )
    {
        std::cout << "Could not allocate memory";
        return -1;
    }
    dPtr[0][0][0] = 10;  // for example
    std::cout << dPtr[0][0][0] << "\n";
    delete3DArray(dPtr);  // free the memory
}

Live Example


Here is a rudimentary RAII class. It has not been tested thoroughly, but should work without any issues.

#include <exception>
#include <algorithm>
#include <iostream>

template <typename T>
class Array3D
{
    T*** data_ptr;
    unsigned m_rows;
    unsigned m_cols;
    unsigned m_pages;

    T*** create3DArray(unsigned pages, unsigned nrows, unsigned ncols, const T& val = T()) 
    {
        T*** ptr = nullptr;  // allocate pointers to pages
        T** ptrMem = nullptr;
        T* pool = nullptr;
        try
        {
            ptr = new T * *[pages];  // allocate pointers to pages
            ptrMem = new T * [pages * nrows]; // allocate pointers to pool
            pool = new T[nrows * ncols * pages]{ val };  // allocate pool

            // Assign page pointers to point to the pages memory,
            // and pool pointers to point to each row the data pool
            for (unsigned i = 0; i < pages; ++i, ptrMem += nrows)
            {
                ptr[i] = ptrMem;
                for (unsigned j = 0; j < nrows; ++j, pool += ncols)
                    ptr[i][j] = pool;
            }
            return ptr;
        }
        catch (std::bad_alloc& ex)
        {
            // rollback the previous allocations
            delete[] ptrMem;
            delete[] ptr;
            throw ex;
        }
    }

    public:
        typedef T value_type;
        T*** data() { return data_ptr;  }

        unsigned get_num_pages() const { return m_pages; }
        unsigned get_num_rows() const { return m_rows; }
        unsigned get_num_cols() const { return m_cols; }

        Array3D() : data_ptr{}, m_rows{}, m_cols{}, m_pages{} {}

        Array3D(unsigned pages, unsigned rows, unsigned cols, const T& val = T())
        {
            if ( pages == 0 )
                throw std::invalid_argument("number of pages is 0");
            if (rows == 0)
                throw std::invalid_argument("number of rows is 0");
            if (cols == 0)
                throw std::invalid_argument("number of columns is 0");
            data_ptr = create3DArray(pages, rows, cols, val);
            m_pages = pages;
            m_rows = rows;
            m_cols = cols;
        }

        ~Array3D()
        {
            if (data_ptr)
            {
                delete[] data_ptr[0][0]; // remove pool
                delete[] data_ptr[0];  // remove the pointers
                delete[] data_ptr;     // remove the pages
            }
        }

        Array3D(const Array3D& rhs) : m_rows(rhs.m_rows), m_cols(rhs.m_cols), m_pages(rhs.m_pages)
        {
            data_ptr = create3DArray(m_pages, m_rows, m_cols);
            std::copy(&rhs.data_ptr[0][0][0], &rhs.data_ptr[m_pages - 1][m_rows - 1][m_cols], &data_ptr[0][0][0]);
        }

        Array3D& operator=(const Array3D& rhs)
        {
            if (&rhs != this)
            {
                Array3D temp(rhs);
                swap(*this, temp);
            }
            return *this;
        }

        Array3D(Array3D&& rhs) noexcept : data_ptr(rhs.data_ptr), m_pages(rhs.m_pages), m_rows(rhs.m_rows), m_cols(rhs.m_cols)
        {
            rhs.data_ptr = nullptr;
        }

        Array3D& operator =(Array3D&& rhs) noexcept
        {
            if (&rhs != this)
            {
                swap(rhs, *this);
            }
            return *this;
        }

        void swap(Array3D& left, Array3D& right)
        {
            std::swap(left.data_ptr, right.data_ptr);
            std::swap(left.m_cols, right.m_cols);
            std::swap(left.m_rows, right.m_rows);
            std::swap(left.m_pages, right.m_pages);
        }

        T** operator[](unsigned page)
        {
            return data_ptr[page];
        }

        const T** operator[](unsigned page) const
        {
            return data_ptr[page];
        }
    };

int main()
{
    try
    {
        Array3D<double> dPtr(10, 10, 10);
        dPtr[0][0][0] = 20;
        dPtr[0][0][3] = -23;
        std::cout << dPtr[0][0][0] << " " << dPtr[0][0][3] << "\n";
        Array3D<double> test = dPtr;
        std::cout << test[0][0][0] << " " << test[0][0][3] << "\n";
        Array3D<double> test2;
        test2 = test;
        std::cout << test2[0][0][0] << " " << test2[0][0][3] << "\n";
    }
    catch (std::exception& ex)
    {
        std::cout << ex.what();
    }
}

Live Example

PaulMcKenzie
  • 34,698
  • 4
  • 24
  • 45
  • I like the generality of this. Much nicer than the hacky version in my edit. Is there any way to overload the constructors and destructors of primitive types? I have a nicer thing I would like. – lee Aug 29 '18 at 04:22
  • 2
    You can package this in a class, as the linked answer states could / should be done to take advantage of RAII (i.e., the array is automatically deallocated when the object goes out of scope). Note you will need to add copy constructor and assignment operator to the class as well as a destructor. I was kind of lazy not to create a full-blown class, but only wanted to show the inner workings of how to start to put together something like this. – PaulMcKenzie Aug 29 '18 at 04:25
  • I was thinking of generalizing it to n dimensions for any type T. I realized what I was thinking won't work. Is there a way to generalize it to N dimensions? Because that is something I will need in the near future for N dimensional matrices of 2d vectors. – lee Aug 29 '18 at 04:31
  • 2
    @SpentDeath: Assuming N is a compile-time constant, that's not too hard, but it will take some template recursion. At the the top level, you create the pool `T[dim1*...*dimN]`, at level 1 you create `T* [dim1*...*dimN-1]` until at the last level you only create `T****[dim1]`. – MSalters Aug 29 '18 at 10:12
  • @PaulMcKenzie I submitted a simple (well, fairly-simple) class implementation. – Davislor Nov 13 '18 at 02:47
2

That was probably a simplified version of your problem, but the data structure you’re using (“three-star" arrays) is almost never the one you want. If you’re creating a dense matrix like here, and allocating space for every element, there is no advantage at all to making millions of tiny allocations. If you want a sparse matrix, you normally want a format like compressed sparse row.

If the array is “rectangular” (or I suppose a 3-D one would be “boxy”), and all the rows and columns are the same size, this data structure is purely wasteful compared to allocating a single memory block. You perform millions of tiny allocations, allocate space for millions of pointers, and lose locality of memory.

This boilerplate creates a zero-cost abstraction for a dynamic 3-D array. (Okay, almost: it’s redundant to store both the length of the underlying one-dimensional std::vector and the individual dimensions.) The API uses a(i, j, k) as the equivalent of a[i][j][k] and a.at(i,j,k) as the variant with bounds-checking.

This API also has an option to fill the array with a function of the indices, f(i,j,k). If you call a.generate(f), it sets each a(i,j,k) = f(i,j,k). In theory, this strength-reduces the offset calculation within the inner loop to make it much faster. The API can also pass the generating function to the constructor as array3d<float>(M, N, P, f). Extend it as you please.

#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <functional>
#include <iomanip>
#include <iostream>
#include <vector>

using std::cout;
using std::endl;
using std::ptrdiff_t;
using std::size_t;

/* In a real-world implementation, this class would be split into a
 * header file and a definitions file.
 */
template <typename T>
  class array3d {
    public:
    using value_type = T;
    using size_type = size_t;
    using difference_type = ptrdiff_t;
    using reference = T&;
    using const_reference = const T&;
    using pointer = T*;
    using const_pointer = const T*;
    using iterator = typename std::vector<T>::iterator;
    using const_iterator = typename std::vector<T>::const_iterator;
    using reverse_iterator = typename std::vector<T>::reverse_iterator;
    using const_reverse_iterator = typename
      std::vector<T>::const_reverse_iterator;

/* For this trivial example, I don’t define a default constructor or an API
 * to resize a 3D array.
 */
    array3d( const ptrdiff_t rows,
             const ptrdiff_t cols,
             const ptrdiff_t layers )
    {
      const ptrdiff_t nelements = rows*cols*layers;

      assert(rows > 0);
      assert(cols > 0);
      assert(layers > 0);
      assert(nelements > 0);

      nrows = rows;
      ncols = cols;
      nlayers = layers;
      storage.resize(static_cast<size_t>(nelements));
    }

/* Variant that initializes an array with bounds and then fills each element
 * (i,j,k) with a provided function f(i,j,k).
 */
    array3d( const ptrdiff_t rows,
             const ptrdiff_t cols,
             const ptrdiff_t layers,
             const std::function<T(ptrdiff_t, ptrdiff_t, ptrdiff_t)> f )
    {
      const ptrdiff_t nelements = rows*cols*layers;

      assert(rows > 0);
      assert(cols > 0);
      assert(layers > 0);
      assert(nelements > 0);

      nrows = rows;
      ncols = cols;
      nlayers = layers;
      storage.reserve(static_cast<size_t>(nelements));

      for ( ptrdiff_t i = 0; i < nrows; ++i )
        for ( ptrdiff_t j = 0; j < ncols; ++j )
          for ( ptrdiff_t k = 0; k < nlayers; ++k )
            storage.emplace_back(f(i,j,k));

      assert( storage.size() == static_cast<size_t>(nelements) );
    }

    // Rule of 5:
    array3d( const array3d& ) = default;
    array3d& operator= ( const array3d& ) = default;
    array3d( array3d&& ) = default;
    array3d& operator= (array3d&&) = default;

    /* a(i,j,k) is the equivalent of a[i][j][k], except that the indices are
     * signed rather than unsigned.  WARNING: It does not check bounds!
     */
    T& operator() ( const ptrdiff_t i,
                    const ptrdiff_t j,
                    const ptrdiff_t k ) noexcept
    {
      return storage[make_index(i,j,k)];
    }

    const T& operator() ( const ptrdiff_t i,
                          const ptrdiff_t j,
                          const ptrdiff_t k ) const noexcept
    {
      return const_cast<array3d&>(*this)(i,j,k);
    }

    /* a.at(i,j,k) checks bounds.  Error-checking is by assertion, rather than
     * by exception, and the indices are signed.
     */
    T& at( const ptrdiff_t i, const ptrdiff_t j, const ptrdiff_t k )
    {
      bounds_check(i,j,k);
      return (*this)(i,j,k);
    }

    const T& at( const ptrdiff_t i,
                 const ptrdiff_t j,
                 const ptrdiff_t k ) const
    {
      return const_cast<array3d&>(*this).at(i,j,k);
    }

/* Given a function or function object f(i,j,k), fills each element of the
 * container with a(i,j,k) = f(i,j,k).
 */
    void generate( const std::function<T(ptrdiff_t,
                                         ptrdiff_t,
                                         ptrdiff_t)> f )
    {
      iterator it = storage.begin();

      for ( ptrdiff_t i = 0; i < nrows; ++i )
        for ( ptrdiff_t j = 0; j < ncols; ++j )
          for ( ptrdiff_t k = 0; k < nlayers; ++k )
            *it++ = f(i,j,k);

      assert(it == storage.end());
    }

/* Could define a larger API, e.g. begin(), end(), rbegin() and rend() from the STL.
 * Whatever you need.
 */

    private:
    ptrdiff_t nrows, ncols, nlayers;
    std::vector<T> storage;

    constexpr size_t make_index( const ptrdiff_t i,
                                 const ptrdiff_t j,
                                 const ptrdiff_t k ) const noexcept
    {
      return static_cast<size_t>((i*ncols + j)*nlayers + k);
    }

    // This could instead throw std::out_of_range, like STL containers.
    constexpr void bounds_check( const ptrdiff_t i,
                                 const ptrdiff_t j,
                                 const ptrdiff_t k ) const
    {
      assert( i >=0 && i < nrows );
      assert( j >= 0 && j < ncols );
      assert( k >= 0 && k < nlayers );
    }
};

// In a real-world scenario, this test driver would be in another source file:

constexpr float f( const ptrdiff_t i, const ptrdiff_t j, const ptrdiff_t k )
{
  return static_cast<float>( k==0 ? 1.0 : -1.0 *
                             ((double)i + (double)j*1E-4));
}

int main(void)
{
  constexpr ptrdiff_t N = 2200, M = 4410, P = 2;
  const array3d<float> a(N, M, P, f);

  // Should be: -1234.4321
  cout << std::setprecision(8) << a.at(1234,4321,1) << endl;

  return EXIT_SUCCESS;
}

It’s worth noting that this code technically contains undefined behavior: it assumes that signed-integer multiplicative overflow produces a negative number, but in fact the compiler is entitled to generate completely broken code if the program requests some absurd amount of memory at runtime.

Of course, if the array bounds are constants, just declare them constexpr and use an array with fixed bounds.

It’s unfortunate that every new C++ programmer learns about char** argv first, because that makes people think that a “two-dimensioal” array is a “ragged” array of pointers to rows.

In the real world, that’s almost never the best data structure for the job.

Davislor
  • 14,674
  • 2
  • 34
  • 49
  • 2
    Even though using such a class is, like you said, not reccomended, I think simply by trying to understand it one can learn a lot about C++ and thus this is a very interesting POC. – Sam96 Jul 16 '19 at 13:36