3

Declaring multidimensional arrays with static size is quite easy in C++ and the array then is stored in one continuous block of memory (row major layout).

Problem

However declaring dynamically allocated multidimensional arrays (size known only at runtime) in C++ is quite tricky as discussed in other SO thread regarding arrays. To preserve the same syntax with multiple square brackets (in case of 2D array) you need to create an array of pointers, which point to another set of arrays (rows). With more dimensions it adds more (unnecessary) levels of indirection, memory fragmentation, and with small sizes of the array the pointers can take more memory than then the actual data.

One of the solutions is to use 1D array and then recalculate the indices.

Example:

3D array with sizes 10, 3 and 5. I want an element at positions 3, 1, 4 instead of writing 3darray[3][1][4] I would write 3darray[index], where index would be calculated as 3*(y_dym_size*z_dym_size) + 1*(z_dym_size) + 4 which, when substituted, results to 3*(3*5)+1*(5)+4.

I can easily make a class that encapsulates a dynamically allocated array and recomputes in indices in the presented manner, but this is not practical, as it needs to be written for every number of dimensions.

Question:

I would like to create a template that would work for arbitrary number of dimensions with zero overhead (which is the spirit of modern C++ - having reusable code/classes where more work is being shifted to the compiler). I have the following code that works for n-dimensional array, however does not have 0 overhead. It contains for loop and also have an array which is being used in the 1D resolution:

template <class T, size_t DIM>
class arrayND{
    std::array<size_t, DIM> sizes;
    std::array<size_t, DIM-1> access_multiplier;
    vector<T> data;

  public:
    using iterator = typename vector<T>::iterator;
    using const_iterator = typename vector<T>::const_iterator;

    template <typename... Args, typename std::enable_if_t<sizeof...(Args) == DIM, int> = 0>
    arrayND(Args&&... args) {
        std::array<size_t, DIM> temp{args...};
        sizes = temp;
        size_t mult = 1;
        for(int i = DIM-2; i >= 0; --i){
            mult *= sizes[i+1];
            access_multiplier[i] = mult;
        }
        data.resize(mult*temp[0]);
    }

    template <typename... Args, typename std::enable_if_t<sizeof...(Args) == DIM, int> = 0>
    T& get(Args&&... args){
        std::array<size_t, DIM> idx_copy{args...};
        size_t index = idx_copy[DIM-1];
        for(int i = DIM-2; i >= 0; --i){
            index += idx_copy[i]*access_multiplier[i];
        }
        return data[index];
    }

    template <typename... Args, typename std::enable_if_t<sizeof...(Args) == DIM, int> = 0>
    T& operator()(Args&&... args){
        return get(args...);
    }

    void set(const T& elem){
        fill(begin(data), end(data), elem);
    }

    iterator begin(){
        return begin(data);
    }

    iterator end(){
        return end(data);
    }

    const_iterator begin() const{
        return cbegin(data);
    }

    const_iterator end() const{
        return cend(data);
    }
};

Other approach I was thinking of was to utilise variadic templates, which would be hopefully - after compiler optimization - identical to code written specially for some number of dimensions:

int getIndex(size_t index){
    return index;
}

template<typename... Args>
int getIndex(size_t index, Args... args){
    return access_multiplier[DIM-sizeof...(Args)-1]*index + getIndex(args...);
}

template <typename... Args, typename std::enable_if_t<sizeof...(Args) == DIM, int> = 0>
T& get(Args&&... args){
    return data[getIndex(args...)];
    /*std::array<size_t, DIM> idx_copy{args...};
    size_t index = idx_copy[DIM-1];
    for(int i = DIM-2; i >= 0; --i){
        index += idx_copy[i]*access_multiplier[i];
    }
    return data[index];*/
}

Is there a way in the current version (C++17) or the C++ language how to obtain both flexibility (arbitrary number of dimensions) and performance (zero overhead compares to code written specially for some number of dimensions)? If there has to be overhead then it makes more sense to hardcode it for lets say up to 5 dimensions. Is there already an implementation of dynamics multidimensional array in some existing library?

Šimon Hrabec
  • 626
  • 1
  • 7
  • 10
  • Static sized arrays don't need a vector. And there are many ways to implement them with `[]`. Either you are confused or there are constraints that require your gymnastics. It is hard to solve a problem with unstated constraints. – Yakk - Adam Nevraumont May 05 '18 at 13:00
  • @Yakk: He wants to be a abe to declare `arrayND<3> small(2,2), big(200,200);` - The number of dimensions is part of the type, but the size of each dimension is passed to the constructor. That needs a vector (or other dynamically allocated buffer). – Martin Bonner supports Monica May 05 '18 at 13:06
  • Thanks Martin for an example, exactly as you wrote. @yakk I have updated the question to explicitly state that this problem is about dynamically allocated array (which I missed). – Šimon Hrabec May 05 '18 at 13:20

1 Answers1

1

Split the view from the storage.

An n-dimensional array view of T is a class with a pointer to T and some way of getting n-1 stride sizes. [] returns an n-1 dimensional array view.

There are two different flavours of such views. The first stores the strides, the second a pointer to a contiguous buffer of strides. Both have their advantages; the first with care can even optimize when some or all dimensions are fixed. But I'll do the 2nd.

template<class T, std::size_t N>
struct slice {
  T* ptr=0;
  std::size_t const* strides=0;
  slice<T,N-1> operator[]( std::size_t i )const{
    return { ptr + i**strides, strides+1 };
  }
};
template<class T>
struct slice<T,1> {
  T* ptr=0;
  std::size_t const* strides=0;
  T& operator[]( std::size_t i )const{
    return *(ptr + i**strides);
  }
};

this one permits per-element strides.

Now you just have to expose a stride<T,N> to do chained [] on. This is similar to how I'd write it for 3 dimensions.

If you prefer (x,y,z) syntax and your only problem is the for loop and are afraid the compiler did not flatten it, you can write it force flattened using pack expansion. But profile and examine optimized assembly first.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524