I am porting some C99 code that makes heavy use of variable length arrays (VLA) to C++.
I replaced the VLAs (stack allocation) with an array class that allocates memory on the heap. The performance hit was huge, a slowdown of a factor of 3.2 (see benchmarks below). What fast VLA replacement can I use in C++? My goal is to minimize performance hit when rewriting the code for C++.
One idea that was suggested to me was to write an array class that contains a fixed-size storage within the class (i.e. can be stack-allocated) and uses it for small arrays, and automatically switches to heap allocation for larger arrays. My implementation of this is at the end of the post. It works fairly well, but I still cannot reach the performance of the original C99 code. To come close to it, I must increase this fixed-size storage (MSL
below) to sizes which I am not comfortable with. I don't want to allocate too-huge arrays on the stack even for the many small arrays that don't need it because I worry that it will trigger a stack overflow. A C99 VLA is actually less prone to this because it will never use more storage than needed.
I came upon std::dynarray
, but my understanding is that it was not accepted into the standard (yet?).
I know that clang and gcc support VLAs in C++, but I need it to work with MSVC too. In fact better portability is one of the main goals of rewriting as C++ (the other goal being making the program, which was originally a command line tool, into a reusable library).
Benchmark
MSL
refers to the array size above which I switch to heap-allocation. I use different values for 1D and 2D arrays.
Original C99 code: 115 seconds.
MSL = 0 (i.e. heap allocation): 367 seconds (3.2x).
1D-MSL = 50, 2D-MSL = 1000: 187 seconds (1.63x).
1D-MSL = 200, 2D-MSL = 4000: 143 seconds (1.24x).
1D-MSL = 1000, 2D-MSL = 20000: 131 (1.14x).
Increasing MSL
further improves performance more, but eventually the program will start returning wrong results (I assume due to stack overflow).
These benchmarks are with clang 3.7 on OS X, but gcc 5 shows very similar results.
Code
This is the current "smallvector" implementation I use. I need 1D and 2D vectors. I switch to heap-allocation above size MSL
.
template<typename T, size_t MSL=50>
class lad_vector {
const size_t len;
T sdata[MSL];
T *data;
public:
explicit lad_vector(size_t len_) : len(len_) {
if (len <= MSL)
data = &sdata[0];
else
data = new T[len];
}
~lad_vector() {
if (len > MSL)
delete [] data;
}
const T &operator [] (size_t i) const { return data[i]; }
T &operator [] (size_t i) { return data[i]; }
operator T * () { return data; }
};
template<typename T, size_t MSL=1000>
class lad_matrix {
const size_t rows, cols;
T sdata[MSL];
T *data;
public:
explicit lad_matrix(size_t rows_, size_t cols_) : rows(rows_), cols(cols_) {
if (rows*cols <= MSL)
data = &sdata[0];
else
data = new T[rows*cols];
}
~lad_matrix() {
if (rows*cols > MSL)
delete [] data;
}
T const * operator[] (size_t i) const { return &data[cols*i]; }
T * operator[] (size_t i) { return &data[cols*i]; }
};