I want to store a 3d-volume in memory. I use a linear array for this purpose and then calculate the 1d-index from the 3d-index. It is wrapped in a class called Volume
that provides functions for accessing the data elements of the array. Here is the function for accessing one data element of the volume:
template<typename T>
inline T& Volume<T>::at(size_t x, size_t y, size_t z) {
if (x >= this->xMax || y >= this->yMax || z >= this->zMax) throw std::out_of_range("Volume index out of bounds");
return this->volume[x * this->yMax*this->zMax + y*this->zMax + z]
}
Now, this linearises the 3d volume with a Z-fastest index order. If the volume is accessed in a loop like this, it is iterated sequentially over the volume elements as they lie in memory:
Volume<float> volume(10, 20, 30); //parameters define size
for(int x = 0; x < volume.xSize(); ++x) {
for(int y = 0; y < volume.ySize(); ++y) {
for int z = 0; z < volume.zSize(); ++z) {
volume.at(x, y, z); //do sth with this voxel
}
}
}
However, if I wrote the loops like this, they would not be accessed sequentially but in a more "random" order:
Volume<float> volume(10, 20, 30); //parameters define size
for(int z = 0; z < volume.zSize(); ++z) {
for(int y = 0; y < volume.ySize(); ++y) {
(for int x = 0; x < volume.zSize(); ++x) {
volume.at(x, y, z); //do sth with this voxel
}
}
}
Now, the first case runs fast, the second case slow. My first question is: why? I guess it has got something to do with caching, but I'm not sure.
Now, I could rewrite the access function for the volume elements like this:
template<typename T>
inline T& Volume<T>::at(size_t x, size_t y, size_t z) {
if (x >= this->xMax || y >= this->yMax || z >= this->zMax) throw std::out_of_range("Volume index out of bounds");
return this->volume[x * this->yMax*this->zMax + y*this->zMax + z]
}
Then loop order #2 would be fast (because access happens sequentially), but loop order #1 slow.
Now, for some reason I need both index orders in my program. And both should be fast. The idea is that it shall be possible to define the index ordering when the volume is created and this index ordering will then be used. First I tried a simple if-else statement in the at
function. However, that did not seem to do the trick.
So I tried something like this when the ordering mode is set:
template<typename T>
void Volume<T>::setMemoryLayout(IndexOrder indexOrder) {
this->mode = indexOrder;
if (indexOrder == IndexOrder::X_FASTEST) {
this->accessVoxel = [this](size_t x, size_t y, size_t z)->T& {
return this->volume[z * this->yMax*this->xMax + y*this->xMax + x];
};
} else {
this->accessVoxel = [this](size_t x, size_t y, size_t z)->T& {
return this->volume[x * this->yMax* this->zMax + y*this->zMax + z];
};
}
}
And then when a voxel is actually accessed:
template<typename T>
inline T& Volume<T>::at(size_t x, size_t y, size_t z) {
if (x >= this->xMax || y >= this->yMax || z >= this->zMax) throw std::out_of_range("Volume index out of bounds");
return this->accessVoxel(x, y, z);
}
So my idea was to reduce that overhead from the if-statement that would be necessary inside the at
function by dynamically defining a lambda function once when the current mode is changed. It then only has to be called when at
is called. However, this did not achieve what I desired.
My question is why my attempts didn't work, and if there is a way I can actually do what I want: a volume that supports X-fastest as well as Y-fastest index ordering and is offering the corresponding performance gain when looped over accordingly.
NOTE: My goal is not to be able to switch between the two modes while there is data assigned to the volume with the data still being read correctly.