I basically prefers version 1 as it is simpler and it saves memory. I am wondering why rtree.begin() is so slow?
What gave you the impression that rtree.begin()
is slow? You don't measure that at all. Here's your code restructured for easier profiling:
Live On Coliru
#define NDEBUG
#define BOOST_DISABLE_ASSERTS 1
#include <boost/geometry.hpp>
#include <chrono>
#include <iostream>
#include <random>
#include <unordered_set>
namespace bg = boost::geometry;
namespace bgi = bg::index;
namespace bgm = bg::model;
typedef bgm::point<float, 2, bg::cs::cartesian> Point;
typedef std::pair<Point, int> PtPair;
auto now = std::chrono::steady_clock::now;
using namespace std::chrono_literals;
using Data = std::vector<PtPair>;
using Tree = bgi::rtree<PtPair, bgi::rstar<16>>;
static Data generate_points(int const NP, unsigned seed, int rdmin = 1, int rdmax = 1'000) {
Data pts;
pts.reserve(NP);
auto roll_die = bind(std::uniform_int_distribution(rdmin, rdmax), std::mt19937{seed});
for (int i = 0; i < NP; ++i)
pts.emplace_back(Point(roll_die(), roll_die()), i);
return pts;
}
void version1(Data const& pts) {
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
std::cout << "version 1 ====================================\n";
auto start = now();
while (!rtree.empty()) {
auto it_tree = rtree.begin();
// std::cout << "node ID: " << (rtree.begin())->second <<
// std::endl; std::cout << "node ID: " << it_tree->second <<
// std::endl;
[[maybe_unused]] auto pt_temp = it_tree->first;
[[maybe_unused]] auto pt_id = it_tree->second;
/*
code skipped: do some queries and calculation, remove the
points from the query results
*/
rtree.remove(*it_tree);
// std::cout << "Tree contains " << rtree.size() << "points."
// << std::endl;
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
if (rtree.empty()) {
std::cout << "empty rtree " << std::endl;
}
}
void version2(Data const& pts) {
std::cout << "version 2 ====================================\n";
std::unordered_set<int> set_op;
for (size_t i = 0; i < pts.size(); ++i)
set_op.insert(i);
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
auto start = now();
// while (!set_op.empty()) { // seems as fast
while (!rtree.empty()) {
auto it = set_op.begin();
// std::cout << "node ID: " << (rtree.begin())->second <<
// std::endl; std::cout << "node ID: " << it_tree->second <<
// std::endl;
[[maybe_unused]] auto pt_temp = pts[*it].first;
[[maybe_unused]] auto pt_id = pts[*it].second;
/*
same code skipped: do some queries and calculation, remove the
points from the query results, together with removing the
corresponding pt_id from set_op.
*/
rtree.remove(pts[*it]);
set_op.erase(it);
// std::cout << "Tree contains " << rtree.size() << "points." << std::endl;
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
if (rtree.empty()) {
std::cout << "empty rtree " << std::endl;
}
}
int main(int argc, char** argv) {
std::list<std::string_view> args(argv + 1, argv + argc);
auto seed = std::random_device{}();
Data pts = generate_points(1'000'000, seed);
for (auto arg:args) {
if (arg=="version1") version1(pts);
if (arg=="version2") version2(pts);
seed = ::atoi(arg.data());
std::cout << "Using seed " << seed << "\n";
pts = generate_points(1'000'000, seed);
}
}
Which can be used like e.g.
g++ -std=c++20 -O2 main.cpp && ./a.out 42 version1 version2
Using seed 42
Tree contains 1000000 points.
version 1 ====================================
elapsed time: 1.28516s
empty rtree
Using seed 0
version 2 ====================================
Tree contains 1000000 points.
elapsed time: 1.02424s
empty rtree
Using seed 0
Note that while version2 does consistently look quicker, there could be surprises, such as data dependencies (e.g. compare ./a.out 42 version1 1 version1
) or CPU/cache/heap warm-up (e.g. compare ./a.out 42 version1 version2
vs. ./a.out 42 version2 version1
).
Profiling
Comparing the output of
valgrind --tool=callgrind --dump-instr=yes ./release/sotest 42 version1
valgrind --tool=callgrind --dump-instr=yes ./release/sotest 42 version2
We see that, highlevel indeed version2 allocates more, but version1 spends significantly more time doing deletion:

Looking at those deletes,

Version1 just does more of them:

Looking at the callsites, the code structure is not the difference:

Both versions call raw_remove
exactly 1'000'000 times:

So, we can conclude with near certainty that the difference is data dependent. Because pts
is identical both versions, it must be down to the order in which we remove points. For unordered_set
the order is basically unspecified but comparing with std::set
(version3
below) shows identical behaviour.
It follows that following iteration order for the rtree
is somewhat of a worst-case order in which to be removing the nodes.
Note that the tree algorithm and tuning makes a big difference too. Using rstar<32>
or even bigger increases the disadvantage for version1
and removes any difference of other versions:

Out Of The Box, Conclusions, Fixes
If the order is not important, I'd suggest version4
:
void version4(Data const& pts) {
std::cout << "version 4 ====================================\n";
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
auto start = now();
for (auto& pt : pts) {
[[maybe_unused]] auto [pt_temp, pt_id] = pt;
rtree.remove(pt);
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
if (rtree.empty()) {
std::cout << "empty rtree " << std::endl;
}
}
In fact, the fastest way is not removing the nodes in the first place. Even using tree iteration order will be way faster here:
void version5(Data const& pts) {
std::cout << "version 5 ====================================\n";
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
auto start = now();
std::set<int> ids_soft_removed;
for (auto& [pt, id] : rtree) {
ids_soft_removed.insert(id);
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
std::cout << "ids soft-removed: " << ids_soft_removed.size() << std::endl;
}
Of course, this is context-free brainstorming, because in this code we could just drop the rtree
altogether.
Full Live Demo
#define NDEBUG
#define BOOST_DISABLE_ASSERTS 1
#include <boost/geometry.hpp>
#include <chrono>
#include <iostream>
#include <random>
#include <unordered_set>
namespace bg = boost::geometry;
namespace bgi = bg::index;
namespace bgm = bg::model;
typedef bgm::point<float, 2, bg::cs::cartesian> Point;
typedef std::pair<Point, int> PtPair;
auto now = std::chrono::steady_clock::now;
using namespace std::chrono_literals;
using Data = std::vector<PtPair>;
using Tree = bgi::rtree<PtPair, bgi::rstar<16>>;
static Data generate_points(int const NP, unsigned seed, int rdmin = 1, int rdmax = 1'000) {
Data pts;
pts.reserve(NP);
auto roll_die = bind(std::uniform_int_distribution(rdmin, rdmax), std::mt19937{seed});
for (int i = 0; i < NP; ++i)
pts.emplace_back(Point(roll_die(), roll_die()), i);
return pts;
}
void version1(Data const& pts) {
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
std::cout << "version 1 ====================================\n";
auto start = now();
while (!rtree.empty()) {
auto it_tree = rtree.begin();
// std::cout << "node ID: " << (rtree.begin())->second <<
// std::endl; std::cout << "node ID: " << it_tree->second <<
// std::endl;
[[maybe_unused]] auto pt_temp = it_tree->first;
[[maybe_unused]] auto pt_id = it_tree->second;
/*
code skipped: do some queries and calculation, remove the
points from the query results
*/
rtree.remove(*it_tree);
// std::cout << "Tree contains " << rtree.size() << "points."
// << std::endl;
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
if (rtree.empty()) {
std::cout << "empty rtree " << std::endl;
}
}
void version2(Data const& pts) {
std::cout << "version 2 ====================================\n";
std::unordered_set<int> set_op;
for (size_t i = 0; i < pts.size(); ++i)
set_op.insert(i);
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
auto start = now();
// while (!set_op.empty()) { // seems as fast
while (!rtree.empty()) {
auto it = set_op.begin();
// std::cout << "node ID: " << (rtree.begin())->second <<
// std::endl; std::cout << "node ID: " << it_tree->second <<
// std::endl;
[[maybe_unused]] auto pt_temp = pts[*it].first;
[[maybe_unused]] auto pt_id = pts[*it].second;
/*
same code skipped: do some queries and calculation, remove the
points from the query results, together with removing the
corresponding pt_id from set_op.
*/
rtree.remove(pts[*it]);
set_op.erase(it);
// std::cout << "Tree contains " << rtree.size() << "points." << std::endl;
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
if (rtree.empty()) {
std::cout << "empty rtree " << std::endl;
}
}
void version3(Data const& pts) {
std::cout << "version 3 ====================================\n";
std::set<int> set_op;
for (size_t i = 0; i < pts.size(); ++i)
set_op.insert(i);
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
auto start = now();
while (!rtree.empty()) {
auto it = set_op.begin();
[[maybe_unused]] auto pt_temp = pts[*it].first;
[[maybe_unused]] auto pt_id = pts[*it].second;
rtree.remove(pts[*it]);
set_op.erase(it);
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
if (rtree.empty()) {
std::cout << "empty rtree " << std::endl;
}
}
void version4(Data const& pts) {
std::cout << "version 4 ====================================\n";
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
auto start = now();
for (auto& pt : pts) {
[[maybe_unused]] auto [pt_temp, pt_id] = pt;
rtree.remove(pt);
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
if (rtree.empty()) {
std::cout << "empty rtree " << std::endl;
}
}
void version5(Data const& pts) {
std::cout << "version 5 ====================================\n";
Tree rtree(pts);
std::cout << "Tree contains " << rtree.size() << " points." << std::endl;
auto start = now();
std::set<int> ids_soft_removed;
for (auto& [pt, id] : rtree) {
ids_soft_removed.insert(id);
}
std::cout << "elapsed time: " << (now() - start) / 1.s << "s\n";
std::cout << "ids soft-removed: " << ids_soft_removed.size() << std::endl;
}
int main(int argc, char** argv) {
std::list<std::string_view> args(argv + 1, argv + argc);
auto seed = std::random_device{}();
Data pts = generate_points(1'000'000, seed);
for (auto arg:args) {
if (arg=="version1") version1(pts);
if (arg=="version2") version2(pts);
if (arg=="version3") version3(pts);
if (arg=="version4") version4(pts);
if (arg=="version5") version5(pts);
seed = ::atoi(arg.data());
std::cout << "Using seed " << seed << "\n";
pts = generate_points(1'000'000, seed);
}
}
Printing:
g++ -std=c++20 -O2 main.cpp && ./a.out 42 version{1..5}
Using seed 42
Tree contains 1000000 points.
version 1 ====================================
elapsed time: 1.29204s
empty rtree
Using seed 0
version 2 ====================================
Tree contains 1000000 points.
elapsed time: 1.03919s
empty rtree
Using seed 0
version 3 ====================================
Tree contains 1000000 points.
elapsed time: 1.0509s
empty rtree
Using seed 0
version 4 ====================================
Tree contains 1000000 points.
elapsed time: 0.934573s
empty rtree
Using seed 0
version 5 ====================================
Tree contains 1000000 points.
elapsed time: 0.750064s
ids soft-removed: 1000000
Using seed 0
As you can see, the fastest order in which to delete is to not delete at all.