2

I have troubles coming up with a good strategy to reduce the memory allocations for the following problem:

I am constructing a tree. At start, I only have the root which contains some data ( a list (std::vector) of indices ). I split in two parts where part of the indices go to the left child and the other part go to the right one. I do not know how many would go to left / right. Once, I am done processing the root, I no longer need to store the indices for it. In fact, I am only interested in those for the leaves. Furthermore, additional data can be added at each split! So, if the root has 20 elements, then after the split the left one may have 12 elements and the right one 10.

In each node, I keep an std::vector which contains those indices. When I am adding the elements, I push_back() the element which leads to many memory allocations.

What would be a good strategy to keep the indices?

The question is relevant to the generation of the SBVH data structure.

Code:

struct Node {
     std::vector<unsigned> indices_;
     // ... some other stuff here
}
void partition(Node* node) {
    Node* left = new Node();
    Node* right = new Node();
    float axis = findSplitPosition(node);

    for(size_t i = 0; i < node->indices_.size(); ++i) {
        if (index is strictly left on axis) {
            left->indices_.push_back(node->indices_[i]);
        } else if(index is strictly right on axis) {
            right->indices_.push_back(node->indices_[i]);
        } else {
            // it intersects the axis -> add to left and right
            left->indices_.push_back(node->indices_[i]);
            right->indices_.push_back(node->indices_[i]);
        }
    }

    // root indices no longer needed
    node->indices_.clear();

}
Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • code could explain more than 1000 lines – Humam Helfawi Feb 09 '16 at 12:49
  • 4
    How about `std::vector::reserve()` ? – Yam Marcovic Feb 09 '16 at 12:50
  • Arez you sure that the allocation time is relevant for the overall performance? Or are you concerned by (whole program, process) memory consumption because the tree is huge? – Basile Starynkevitch Feb 09 '16 at 13:01
  • It does play a role. The tree may contain million of indices. The tree may go somewhat deep (it is not a balanced tree). Separating the data like that leads to more cache misses, and also the memory allocations slow down the construction. If I am using another construction algorithm which does more or less the same work except that it does not do so many allocations, it is 3 times faster. – Мартин Радев Feb 09 '16 at 13:03
  • 1
    @МартинРадев: That should go into your question! – Basile Starynkevitch Feb 09 '16 at 13:10

2 Answers2

3

If each node has to maintain a dynamic list itself, then you can use std::vector::reserve() before calling all those push_back()s.

If, however, the entire length is determined once you set up the root and that initial vector remains unchanged, and then you just "split it" between each node, then the nodes themselves can simply hold pointers to the data inside the initial vector—thereby eliminating nearly all allocations around this data structure.

Yam Marcovic
  • 7,953
  • 1
  • 28
  • 38
  • I do not know the size beforehand. The root may have 30 elements, but after doing the split the left + right might be in total 50 elements. Namely, some elements may go to the left and to the right. I can reserve the memory, but I have to go over them, count and just then. I do not want to iterate twice. I do know the entire length. In the standard BVH construction I just oreorder the elements and keep start and end indices. Here I cannnot do so. – Мартин Радев Feb 09 '16 at 12:59
  • @МартинРадев I meant that, if the basic splittable vector never gets modified (elements are added or removed), then you really don't even have to maintain separate memory blocks for each node. You can just have an offset pointer + size into the root vector. – Yam Marcovic Feb 09 '16 at 13:41
0

Basically if you cannot reserve the vectors based on some heuristics, you fall victim to Schlemiel's algorithm (a milder version though, because the geometric growth ensures a O(n) time complexity on n consecutive insertions instead of O(n^2)).

But you can get away with a constant number of allocations if first you go through the node's indices and record whether the given index should go to the left subnode, the right subnode or both. Also keep track of the left/right subnode's index count:

struct Foo {
    bool goesIntoLeft;
    bool goesIntoRight;
};

std::vector<Foo> foo;
foo.reserve(node->_indices.size());
int leftCount = 0;
int rightCount = 0;
for (auto i = 0; i < node->_indices.size(); ++i) {
    if (index goes into left) {
        foo.push_back({ true, false });
        ++leftCount;
    } else if (index goes into right) {
        foo.push_back({ false, true });
        ++rightCount;
    } else { // goes into both
        foo.push_back({ true, true });
        ++leftCount;
        ++rightCount;
    }
}

std::vector<Node> leftNodes;
leftNodes.reserve(leftCount);
std::vector<Node> rightNodes;
rightNodes.reserve(rightCount);

for (auto i = 0; i < node->_indices.size(); ++i) {
    if (foo[i].goesIntoLeft) leftNodes.push_back(nodes._indices[i]);
    if (foo[i].goesIntoRight) rightNodes.push_back(nodes._indices[i]);
}

This way you only do 3 allocations, namely foo, leftNodes, rightNodes. Though you have to iterate through the indices twice, the heavy lifting (the geometric calculations) is done in the first loop exclusively.

Tamás Zahola
  • 9,271
  • 4
  • 34
  • 46