I have a huge tree that can take up to several gigabytes. The node structure is as below. You'll notice that I made the last member an array of size 1. The reason for this is that I can over-allocate a Node
with flexible sizes. Similar to what C natively supports as a flexible array member. I could use std::unique_ptr<T[]>
or std::vector<T>
instead, but the problem is that then there is double dynamic allocation, double indirection, and extra cache misses per each tree node. In my last test, this made my program take about 50% more time, which is simply not acceptable for my application.
template<typename T>
class Node
{
public:
Node<T> *parent;
Node<T> *child;
/* ... */
T &operator[](int);
private;
int size;
T array[1];
};
The simplest way to implement operator[]
would be this.
template<typename T>
T &Node::operator[](int n)
{
return array[n];
}
It should work fine in most sane C++ implementations. But as the C++ standard allows insane implementations doing array bounds checking, as fas as I know this is technically invoking undefined behaviour. Then can I do this?
template<typename T>
T &Node::operator[](int n)
{
return (&array[0])[n];
}
I'm a little confused here. The []
operator for primitive types is just a syntactic sugar to *
. Thus (&array[0])[n]
is equivalent to (&*(array + 0))[n]
, which I think can be cleaned up as array[n]
, making everything the same as the first one. Okay but I can still do this.
template<typename T>
T &Node::operator[](int n)
{
return *(reinterpret_cast<T *>(reinterpret_cast<char *>(this) + offsetof(Node<T>, array)) + n);
}
I hope I'm now free from the possible undefined behaviours. Perhaps inline assembly will show my intent better. But do I really have to go this far? Can someone clarify things to me?
By the way T
is always a POD type. The whole Node
is also POD.