boost rtree get a node from the tree, why `rtree.begin()` is so slow?

Question

I am using the Rtree algorithm implemented in boost::geometry::index library boost::geometry::index library (bgi) .

The tree is very big with up to billions of 2d points. In the code I need to pick up a random point on the tree, and do some query based on the selected point, the returned points by the query, together with the selected point will be deleted from the tree. This procedure is repeated until the tree is empty.

At the beginning, I had the code as in version2, where I have a companying unordered_set to store the point ids, whenever I delted any point from the tree, their corresponding ids will also be delted from the unordered_set. Later I realized there is the rtree.empty() method, I rewrote my code to make use of it because the unordered_set actually takes a lot of memory when the number of points is extremly large. This rewriting led to version 1 as the example code. However, I found version 1 took much more to to execute.

I tried to write the following simplfied code to compare the speed of the two versions, where the skipped code actually contains very complex queries which I do not want to show. The skipped parts in the two version are almost the same except that in version 2, whenever a point gets deleted from the tree, its id gets deleted from the unordered_set too. Therefore, I suspect it is rtree.begin() which makes version 1 slower.

I basically prefers version 1 as it is simpler and it saves memory. I am wondering why rtree.begin() is so slow? Is there any other way I can stick to version 1 but with no slower performance?

I am learning C++ and my code might look very stupid.

I compile the code with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, boost 1.80, with flag -Wall -Wextra -g3 .

#include <iostream>
#include <vector>
#include <ctime>
#include <chrono>
#include <unordered_set>

#include <boost/geometry.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/variate_generator.hpp>

namespace bg  = boost::geometry;
namespace bgi = bg::index;
namespace bgm = bg::model;

//typedef bgm::point<float, 2, bg::cs::spherical_equatorial<bg::degree> > Point;
typedef bgm::point<float, 2, bg::cs::cartesian> Point;
typedef std::pair<Point, int> PtPair;

boost::mt19937 gen;
int roll_die(int min, int max) {
    boost::uniform_int<> dist(min, max);
    boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(gen, dist);
    return die();
}


int main() {
    using Tree = bgi::rtree<PtPair, bgi::rstar<16> >;
    const int NP = 1000000;
    std::vector<PtPair> pts;
    
    pts.reserve(NP);

    int rdmin = 1;
    int rdmax = 1000;
    gen.seed(static_cast<unsigned int>(std::time(NULL)));

    for(int i = 0; i < NP; ++i) {
        pts.push_back(PtPair(Point(roll_die(rdmin, rdmax), roll_die(rdmin, rdmax)), i));
    }

    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    std::chrono::time_point<std::chrono::system_clock> start, end;
    //Point pt_temp;
    //int pt_id;

    std::cout << "version 1 ====================================\n";
    start = std::chrono::system_clock::now();
    
    while (!rtree.empty()) {
        auto it_tree = rtree.begin();
        //std::cout << "node ID: " << (rtree.begin())->second << std::endl;
        //std::cout << "node ID: " << it_tree->second << std::endl;

        auto pt_temp = it_tree->first;
        auto pt_id = it_tree->second;

        /*
        code skipped: do some queries and calculation, remove the points from the query results
        */
        
        rtree.remove(*it_tree);
        //std::cout << "Tree contains " << rtree.size() << "points." << std::endl;
        }
    
    end = std::chrono::system_clock::now();
    std::chrono::duration<double> elapsed_seconds = end - start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

    if(rtree.empty()) {
        std::cout << "empty rtree " << std::endl;
    }

    std::cout << "version 2 ====================================\n";

    std::vector<int> vec_temp(NP);
    std::iota(vec_temp.begin(), vec_temp.end(), 0);
    std::unordered_set<int> set_op(std::begin(vec_temp), std::end(vec_temp));
    Tree rtree2(pts);
    start = std::chrono::system_clock::now();

    // "while (!rtree2.empty())" seems to be as fast as "while (!set_op.empty())"
    //while (!set_op.empty()) {
    while (!rtree2.empty()) {
        auto it = set_op.begin();
        //std::cout << "node ID: " << (rtree.begin())->second << std::endl;
        //std::cout << "node ID: " << it_tree->second << std::endl;
        
        auto pt_temp = pts[*it].first;
        auto pt_id = pts[*it].second;

        /*
        same code skipped: do some queries and calculation, remove the points from the query results, together with removing the corresponding pt_id from set_op.

        */

        rtree2.remove(pts[*it]);
        set_op.erase(it);
        //std::cout << "Tree contains " << rtree.size() << "points." << std::endl;
        }
    
    end = std::chrono::system_clock::now();
    elapsed_seconds = end - start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

    if(rtree2.empty()) {
        std::cout << "empty rtree2 " << std::endl;
    }
}

I tried to measure the performance of the two different versions of code, to identify the part which leads to slower executation.

I want to know why it is so slow and is there any way to improve it.

sehe · Accepted Answer · 2023-05-16T18:42:33.480

I basically prefers version 1 as it is simpler and it saves memory. I am wondering why rtree.begin() is so slow?

What gave you the impression that rtree.begin() is slow? You don't measure that at all. Here's your code restructured for easier profiling:

Live On Coliru

#define NDEBUG
#define BOOST_DISABLE_ASSERTS 1
#include <boost/geometry.hpp>
#include <chrono>
#include <iostream>
#include <random>
#include <unordered_set>

namespace bg  = boost::geometry;
namespace bgi = bg::index;
namespace bgm = bg::model;

typedef bgm::point<float, 2, bg::cs::cartesian> Point;
typedef std::pair<Point, int>                   PtPair;

auto now = std::chrono::steady_clock::now;
using namespace std::chrono_literals;

using Data = std::vector<PtPair>;
using Tree = bgi::rtree<PtPair, bgi::rstar<16>>;

static Data generate_points(int const NP, unsigned seed, int rdmin = 1, int rdmax = 1'000) {
    Data pts;
    pts.reserve(NP);

    auto roll_die = bind(std::uniform_int_distribution(rdmin, rdmax), std::mt19937{seed});

    for (int i = 0; i < NP; ++i)
        pts.emplace_back(Point(roll_die(), roll_die()), i);

    return pts;
}

void version1(Data const& pts) {
    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    std::cout << "version 1 ====================================\n";
    auto start = now();

    while (!rtree.empty()) {
        auto it_tree = rtree.begin();
        // std::cout << "node ID: " << (rtree.begin())->second <<
        // std::endl; std::cout << "node ID: " << it_tree->second <<
        // std::endl;

        [[maybe_unused]] auto pt_temp = it_tree->first;
        [[maybe_unused]] auto pt_id   = it_tree->second;

        /*
           code skipped: do some queries and calculation, remove the
           points from the query results
           */

        rtree.remove(*it_tree);
        // std::cout << "Tree contains " << rtree.size() << "points."
        // << std::endl;
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    if (rtree.empty()) {
        std::cout << "empty rtree " << std::endl;
    }
}

void version2(Data const& pts) {
    std::cout << "version 2 ====================================\n";

    std::unordered_set<int> set_op;
    for (size_t i = 0; i < pts.size(); ++i)
        set_op.insert(i);
    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    auto start = now();

    // while (!set_op.empty()) { // seems as fast
    while (!rtree.empty()) {
        auto it = set_op.begin();
        // std::cout << "node ID: " << (rtree.begin())->second <<
        // std::endl; std::cout << "node ID: " << it_tree->second <<
        // std::endl;

        [[maybe_unused]] auto pt_temp = pts[*it].first;
        [[maybe_unused]] auto pt_id   = pts[*it].second;

        /*
           same code skipped: do some queries and calculation, remove the
           points from the query results, together with removing the
           corresponding pt_id from set_op.
           */

        rtree.remove(pts[*it]);
        set_op.erase(it);
        // std::cout << "Tree contains " << rtree.size() << "points." << std::endl;
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    if (rtree.empty()) {
        std::cout << "empty rtree " << std::endl;
    }
}

int main(int argc, char** argv) {
    std::list<std::string_view> args(argv + 1, argv + argc);

    auto seed = std::random_device{}();
    Data pts = generate_points(1'000'000, seed);

    for (auto arg:args) {
        if (arg=="version1") version1(pts);
        if (arg=="version2") version2(pts);

        seed = ::atoi(arg.data());
        std::cout << "Using seed " << seed << "\n";
        pts = generate_points(1'000'000, seed);
    }
}

Which can be used like e.g.

g++ -std=c++20 -O2 main.cpp && ./a.out 42 version1 version2
Using seed 42
Tree contains 1000000 points.
version 1 ====================================
elapsed time: 1.28516s
empty rtree 
Using seed 0
version 2 ====================================
Tree contains 1000000 points.
elapsed time: 1.02424s
empty rtree 
Using seed 0

Note that while version2 does consistently look quicker, there could be surprises, such as data dependencies (e.g. compare ./a.out 42 version1 1 version1) or CPU/cache/heap warm-up (e.g. compare ./a.out 42 version1 version2 vs. ./a.out 42 version2 version1).

Profiling

Comparing the output of

valgrind --tool=callgrind --dump-instr=yes ./release/sotest 42 version1
valgrind --tool=callgrind --dump-instr=yes ./release/sotest 42 version2

We see that, highlevel indeed version2 allocates more, but version1 spends significantly more time doing deletion:

Looking at those deletes,

Version1 just does more of them:

Looking at the callsites, the code structure is not the difference:

Both versions call raw_remove exactly 1'000'000 times:

So, we can conclude with near certainty that the difference is data dependent. Because pts is identical both versions, it must be down to the order in which we remove points. For unordered_set the order is basically unspecified but comparing with std::set (version3 below) shows identical behaviour.

It follows that following iteration order for the rtree is somewhat of a worst-case order in which to be removing the nodes.

Note that the tree algorithm and tuning makes a big difference too. Using rstar<32> or even bigger increases the disadvantage for version1 and removes any difference of other versions:

Out Of The Box, Conclusions, Fixes

If the order is not important, I'd suggest version4:

void version4(Data const& pts) {
    std::cout << "version 4 ====================================\n";

    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    auto start = now();

    for (auto& pt : pts) {
        [[maybe_unused]] auto [pt_temp, pt_id] = pt;
        rtree.remove(pt);
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    if (rtree.empty()) {
        std::cout << "empty rtree " << std::endl;
    }
}

In fact, the fastest way is not removing the nodes in the first place. Even using tree iteration order will be way faster here:

void version5(Data const& pts) {
    std::cout << "version 5 ====================================\n";

    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    auto start = now();

    std::set<int> ids_soft_removed;
    for (auto& [pt, id] : rtree) {
        ids_soft_removed.insert(id);
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    std::cout << "ids soft-removed: " << ids_soft_removed.size() << std::endl;
}

Of course, this is context-free brainstorming, because in this code we could just drop the rtree altogether.

Full Live Demo

#define NDEBUG
#define BOOST_DISABLE_ASSERTS 1
#include <boost/geometry.hpp>
#include <chrono>
#include <iostream>
#include <random>
#include <unordered_set>

namespace bg  = boost::geometry;
namespace bgi = bg::index;
namespace bgm = bg::model;

typedef bgm::point<float, 2, bg::cs::cartesian> Point;
typedef std::pair<Point, int>                   PtPair;

auto now = std::chrono::steady_clock::now;
using namespace std::chrono_literals;

using Data = std::vector<PtPair>;
using Tree = bgi::rtree<PtPair, bgi::rstar<16>>;

static Data generate_points(int const NP, unsigned seed, int rdmin = 1, int rdmax = 1'000) {
    Data pts;
    pts.reserve(NP);

    auto roll_die = bind(std::uniform_int_distribution(rdmin, rdmax), std::mt19937{seed});

    for (int i = 0; i < NP; ++i)
        pts.emplace_back(Point(roll_die(), roll_die()), i);

    return pts;
}

void version1(Data const& pts) {
    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    std::cout << "version 1 ====================================\n";
    auto start = now();

    while (!rtree.empty()) {
        auto it_tree = rtree.begin();
        // std::cout << "node ID: " << (rtree.begin())->second <<
        // std::endl; std::cout << "node ID: " << it_tree->second <<
        // std::endl;

        [[maybe_unused]] auto pt_temp = it_tree->first;
        [[maybe_unused]] auto pt_id   = it_tree->second;

        /*
           code skipped: do some queries and calculation, remove the
           points from the query results
           */

        rtree.remove(*it_tree);
        // std::cout << "Tree contains " << rtree.size() << "points."
        // << std::endl;
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    if (rtree.empty()) {
        std::cout << "empty rtree " << std::endl;
    }
}

void version2(Data const& pts) {
    std::cout << "version 2 ====================================\n";

    std::unordered_set<int> set_op;
    for (size_t i = 0; i < pts.size(); ++i)
        set_op.insert(i);
    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    auto start = now();

    // while (!set_op.empty()) { // seems as fast
    while (!rtree.empty()) {
        auto it = set_op.begin();
        // std::cout << "node ID: " << (rtree.begin())->second <<
        // std::endl; std::cout << "node ID: " << it_tree->second <<
        // std::endl;

        [[maybe_unused]] auto pt_temp = pts[*it].first;
        [[maybe_unused]] auto pt_id   = pts[*it].second;

        /*
           same code skipped: do some queries and calculation, remove the
           points from the query results, together with removing the
           corresponding pt_id from set_op.
           */

        rtree.remove(pts[*it]);
        set_op.erase(it);
        // std::cout << "Tree contains " << rtree.size() << "points." << std::endl;
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    if (rtree.empty()) {
        std::cout << "empty rtree " << std::endl;
    }
}

void version3(Data const& pts) {
    std::cout << "version 3 ====================================\n";

    std::set<int> set_op;
    for (size_t i = 0; i < pts.size(); ++i)
        set_op.insert(i);
    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    auto start = now();

    while (!rtree.empty()) {
        auto it = set_op.begin();

        [[maybe_unused]] auto pt_temp = pts[*it].first;
        [[maybe_unused]] auto pt_id   = pts[*it].second;

        rtree.remove(pts[*it]);
        set_op.erase(it);
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    if (rtree.empty()) {
        std::cout << "empty rtree " << std::endl;
    }
}

void version4(Data const& pts) {
    std::cout << "version 4 ====================================\n";

    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    auto start = now();

    for (auto& pt : pts) {
        [[maybe_unused]] auto [pt_temp, pt_id] = pt;
        rtree.remove(pt);
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    if (rtree.empty()) {
        std::cout << "empty rtree " << std::endl;
    }
}

void version5(Data const& pts) {
    std::cout << "version 5 ====================================\n";

    Tree rtree(pts);
    std::cout << "Tree contains " << rtree.size() << " points." << std::endl;

    auto start = now();

    std::set<int> ids_soft_removed;
    for (auto& [pt, id] : rtree) {
        ids_soft_removed.insert(id);
    }

    std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";

    std::cout << "ids soft-removed: " << ids_soft_removed.size() << std::endl;
}

int main(int argc, char** argv) {
    std::list<std::string_view> args(argv + 1, argv + argc);

    auto seed = std::random_device{}();
    Data pts = generate_points(1'000'000, seed);

    for (auto arg:args) {
        if (arg=="version1") version1(pts);
        if (arg=="version2") version2(pts);
        if (arg=="version3") version3(pts);
        if (arg=="version4") version4(pts);
        if (arg=="version5") version5(pts);

        seed = ::atoi(arg.data());
        std::cout << "Using seed " << seed << "\n";
        pts = generate_points(1'000'000, seed);
    }
}

Printing:

g++ -std=c++20 -O2 main.cpp && ./a.out 42 version{1..5}
Using seed 42
Tree contains 1000000 points.
version 1 ====================================
elapsed time: 1.29204s
empty rtree 
Using seed 0
version 2 ====================================
Tree contains 1000000 points.
elapsed time: 1.03919s
empty rtree 
Using seed 0
version 3 ====================================
Tree contains 1000000 points.
elapsed time: 1.0509s
empty rtree 
Using seed 0
version 4 ====================================
Tree contains 1000000 points.
elapsed time: 0.934573s
empty rtree 
Using seed 0
version 5 ====================================
Tree contains 1000000 points.
elapsed time: 0.750064s
ids soft-removed: 1000000
Using seed 0

As you can see, the fastest order in which to delete is to not delete at all.

thanks a lot, that's really overwhelming. I probably need to spend more time learning from your answer, especially how to profile the code, I am a beginner and barely know anything about that. Aside from that, I will try to communicate more with further comments. — learner_lyf, May 16 '23 at 19:38
`What gave you the impression that rtree.begin() is slow? You don't measure that at all. Here's your code restructured for easier profiling` --- My impression came from that the two versions are almost identical except one uses `rtree.begin()` and the other uses `unordered_set.front()`, now I know I couldn't come to that conclusion without systematically profiling the code using proper tools. — learner_lyf, May 16 '23 at 20:09
Regarding your suggestion of better not removing the point, that was actually what I did in my code a few weeks ago. Sorry I did not show the skipped code which contains an important context. I am actually clustering the points based on their distances using Rtree. If I do not delete already visited points, they will be returned in many later queries, which is not necessary. I compared that if I deleted the already clustered points, the code runs more than 10x faster. — learner_lyf, May 16 '23 at 20:10
Looping over the points in the vector container or in the Rtree container both do not work, because as I wrote in the comment, some points on the Rtree might be deleted in the query part in skipped code, which will invalidate the Rtree iterator. — learner_lyf, May 16 '23 at 20:11
If the observation is that deletion is slow, keeping a set of deleted points makes sense. Of course, I cannot profile it for you without a representative benchmark. — sehe, May 16 '23 at 21:23

boost rtree get a node from the tree, why `rtree.begin()` is so slow?

1 Answers1

Profiling

Out Of The Box, Conclusions, Fixes