count the frequency of groups in vector using rtree ( or any other algorithm )

Question

given following set of points in a vector {( 100, 150 ), ( 101, 152 ), ( 102, 151 ), ( 105, 155 ), ( 50, 50 ), ( 51, 55 ), ( 55, 55 ), ( 150, 250 ), ( 190, 260 ) }

I need to identify neighboring points and their count. Let us say the acceptable distance has been set as 5. Now I need following output: frequency of point ( 100, 150 ) with in 5 units is 4. frequency of point ( 50, 50 ) with in 5 units is 3 frequency of point ( 150, 250 ) within 5 units is 1 frequency of point ( 190, 260 ) within 5 units is 1

I have tried a RTree solution to this problem but couldn't determine the logic to exclude all neighboring points as candidates. Means Once I have identified there are four neighbors of ( 100, 150 ), I don't want to identify the neighbors of those neighbors. I would like to move on to next value. Here are the presumptions: 1. efficiency is the topmost concern 2. The vector is unsorted 3. the vector may contain thousands of points. I am using C++ and boost implementation of RTree. Please guide me how can I achieve the solution

Here is the code following code which does counts the number of neighbors of unique points in the vector. I need guidance on excluding neighbors of a point once they have been identified.

       include set, iostream, boost/geometry.hpp,       boost/geometry/geometries/point.hpp, boost/geometry/index/rtree.hpp

      using namespace std;
      namespace bg = boost::geometry;
      namespace bgi = boost::geometry::index;

     typedef bg::model::point<int, 2, bg::cs::cartesian> point;
     typedef std::pair<point, unsigned> value;

    struct ltstr
    {
       bool operator()(const point &p1, const point &p2) const
    {
        return (p1.get < 0 >() < p2.get < 0 >() || p1.get < 1 >() < p2.get < 1 >());
}
   };


       void main()
      {
vector<point> candidatePoints{ point(457, 184), point(457, 184), point(457, 184), point(457, 184), point(457, 184),
    point(456, 184), point(456, 184), point(456, 184), point(456, 184), point(456, 184),
    point(456, 184), point(457, 184), point(457, 184), point(457, 184), point(458, 184), point(459, 185) };

bgi::rtree< value, bgi::quadratic<16> > rtree;

set<point, ltstr> uniqueCandidatePoints;

for (int i = 0; i < candidatePoints.size(); ++i)
{
    int x = candidatePoints[i].get < 0 >();
    int y = candidatePoints[i].get < 1 >();
    uniqueCandidatePoints.insert(point(x, y));
    rtree.insert(make_pair(candidatePoints[i], i));
}

for (auto it = uniqueCandidatePoints.begin(); it != uniqueCandidatePoints.end(); ++it)
{
    std::vector<value> returnedValues;
    point currentItem = *it;
    rtree.query(bgi::satisfies([&](value const& v) {return bg::distance(v.first, currentItem) < 5; }),
        std::back_inserter(returnedValues));

    cout << "Current Item: " << currentItem.get < 0 >() << "," << currentItem.get < 1 >() << "Count: " << returnedValues.size() << endl;
} 

getchar();
  }

Not broad at all, on the contrary it's quite specific. I have prepered the answer, please reopen so that I can post it — Nikos Athanasiou, Jun 29 '15 at 19:14
@NikosAthanasiou -- your edit to the question really should have been a comment. I've rolled back the edit, but feel free to link OP to your answer here in the comments. — Michael0x2a, Jun 29 '15 at 19:45
@Michael0x2a My edit to the question should have been an answer and this question should not have been closed. I've struggled with such topics in the past and can understand how frustrating it can be. Anyway, here's the [**Link**](http://coliru.stacked-crooked.com/a/3867d770da36348a), have fun, I'd appreciate a notification if the topic reopens (there's a lot to be said about the code in link) — Nikos Athanasiou, Jun 29 '15 at 21:58
@NikosAthanasiou thanks for the elegant code using rtree. But it doesn't do exactly what i want to do. Once the neighbors are identified I want to exclude them in next step. In the output of your code neighborhood of point { 100, 150 } ----------------- { 101, 152 } { 102, 151 } ----------------- now i don't want to compute neighbors for {101, 152} and {102, 151}. I would like to continue check for {105, 155}. Kindly guide me how can I achieve this. — Prem, Jun 30 '15 at 01:37
@NikosAthanasiou I have included my code in the question. I can see your code is more elegant. Can you please help me for the final solution. Cause you were the one who understood exactly what I needed at the first place. — Prem, Jul 01 '15 at 08:54

Nikos Athanasiou · Accepted Answer · 2015-07-06T07:43:08.160

R-trees are one of the most useful spatial indexing data structures yet are proven useful for specific domains and problems. That been said, that's not a reason to refrain from being didactic (after all what's asked may be a simplification of the actual problem).

If you choose to use R-trees you are performing a domain decomposition. Much like space filling curves you can have an ordering of the space at hand, yet node elements maintan spatial vicinity (the further you get away from the root).

An ideal solution would be to build your R-Trees in a way that radius=5 regions are formed but that would require a custom data structure and a customization of STR or bulk loading algorithms and would resemble to a clustering algorithm.

With boost::index available you can identify all neiborhoods, I'll try to elaborate on the code :

Mandatory includes

#include <vector>
#include <iostream>
#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point.hpp>
#include <boost/geometry/geometries/box.hpp>
#include <boost/geometry/index/rtree.hpp>

Definitions

namespace bg  = boost::geometry;
namespace bgi = boost::geometry::index;   
using  point  = bg::model::point < float, 2, bg::cs::cartesian > ;

Auxiliaries

Boost R-Trees have a query method. Though it's designed to perform typical queries like kNN or overlaping you can feed it custom predicates. Here we design one that returns true if the point we query is up to max_dist away from a base point (both variables specified at construction)

struct distance_pred
{
    point const& _base; 
    double       _threshold; 

    distance_pred(point const& base, double max_dist)
        : _base(base)
        , _threshold(max_dist)
    {
    }
    bool operator()(point const& p) const
    {
        auto d = boost::geometry::distance(_base, p); 
        return d && d < _threshold; 
    }
};

// just for output
std::ostream& operator<<(std::ostream &os, point const& p)
{
    os << "{ " << p.get<0>() << ", " << p.get<1>() << " }"; 
    return os; 
}

Execution

For every point, we query those that lie at most distance=5 apart

int main()
{
    std::vector<point> cloud {
        point(100, 150), point(101, 152), point(102, 151), 
        point(105, 155), point( 50,  50), point( 51,  55), 
        point( 55,  55), point(150, 250), point(190, 260) 
    }; 

    bgi::rtree<point, bgi::quadratic<16>> rtree(cloud.begin(), cloud.end());

    std::vector<point> hood;
    for (auto &&p : cloud)
    {
        hood.clear(); 
        std::cout << "neighborhood of point " << p << "\n-----------------\n\n";

        rtree.query(bgi::satisfies(distance_pred(p, 5)), std::back_inserter(hood)); 

        // Output the results -----------------------------------------
        if (!hood.empty())
        {
            for (auto &&pt : hood) std::cout << '\t' << pt << std::endl;
        }
        else
        {
            std::cout << "\t... is empty\n"; 
        }

        std::cout << std::endl; 
    }
}

Extensions

If you want to exlude something, I believe a clustering algorithm would be more fitting and that is out of RTrees scope. For example what if the points you exluded due to lying close to point1 happen to lie closer to a point2 ?

Yet if you really want to do it, it's just a matter of bookeeping. Define a point like so

using  pointI  = std::pair<point, std::size_t>; // remember : geometric info first

and transform you for loop to

for (std::size_t i(0), i < cloud.size(); ++i)
{
    if (cloud.end() != std::find(rec.begin(), rec.end(), i))
    { // you'll only be building neighorhoods for points that are not touched
        // queries and recording performed on point2 types  
    }
}

The Full code demonstrates the problems in this logic : lots of neighborhoods remain empty.

Same as the above can be achieved with much less code yet sightly more complicated (basically we put the query into a lambda function and use query iterators to loop through the results) Demo

I am getting error message for auto in following lambada function. can you please suggest what it should be replaced with? std::for_each(hood.begin(), hood.end(), [&rec](auto &&pt) { rec.push_back(pt.second); — Prem, Jul 01 '15 at 17:08
@Prem replace `auto` with `pointI const&` and get a compiler update for Visual Studio — Nikos Athanasiou, Jul 01 '15 at 17:27
Thanks a bunch... this is exactly what I needed when I posted this question.... Thank you again.... — Prem, Jul 02 '15 at 01:20
can you please explain the logic of following in the code? if (rec.end() == std::find(rec.begin(), rec.end(), i)) and // exlude neighborhood from following queries std::for_each(hood.begin(), hood.end(), [&rec](auto &&pt) { rec.push_back(pt.second); — Prem, Jul 05 '15 at 07:05
@Prem You can either ask a different SO question (they are free) or google the following keywords : `std::find` `std::for_each` and `lambda functions`. I doubt I could elaborate on them in the form of comments. You can notify me with a link to the question even though I believe it'll be quickly answered — Nikos Athanasiou, Jul 05 '15 at 16:01
i have asked this question http://stackoverflow.com/questions/31320633/packing-algorithm-in-rtree-in-boost can we use packing algorithm for the same example instead of quadratic, packing seems to be way to faster than other variations. As we already have values to be inserted in rtree in a vector there should be no problem specifying range while creating rtree. Thanks in advance... :) — Prem, Jul 09 '15 at 14:39
There is one problem with the code. UnaryPredicate passed into the `satisfies()` BGIPredicate is checked for each Value stored in the R-tree. So effectively the spatial indexing features are not used during this query because all Values will be checked anyway. In addition to your UnaryPredicate you should pass a spatial predicate to filter unwanted areas of space, e.g.: `bgi::intersects(box(point(bg::get<0>(p)-5, bg::get<1>(p)-5), point(bg::get<0>(p)+5, bg::get<1>(p)+5))) && bgi::satisfies(distance_pred(p, 5))` — Adam Wulkiewicz, Oct 22 '15 at 14:56
@Prem The Boost.Geometry R-tree is created with packing algorithm when a Range is passed into the constructor. So actually this case is covered by the answer. — Adam Wulkiewicz, Oct 22 '15 at 15:01

score 0 · Answer 2 · answered Jun 29 '15 at 17:28

0

R-tree is just a data structure but not an algorithm and to me it looks quite complicated. Unless you really need to deal with mirco efficiency I would use plain standard algorithms and a vector of points. std::count_if would be my first guess.

answered Jun 29 '15 at 17:28

463035818_is_not_an_ai

109,796
11
89
185

I decided to use R-tree after long and frustrating attempt to solve using count_if. Can I get some more elaboration on solution? This is not the solution at all. And to the veterans who voted my question as broad and generic, do you want me to post the code which i wrote in an attempt to solve this problem to be specific? – Prem Jun 30 '15 at 01:56
@Prem again: I does not make much sense to decide between R-Tree **or** using `count_if`. One is a data structure and the other is an algorithm (working on certain data structures). And yes, posting any effort you already made can only help to get better answers. Given the information you provide at the moment, pointing you towards the standard algorithm is the best I can do to help you. – 463035818_is_not_an_ai Jun 30 '15 at 11:52
I understand the difference very well. My intention was to say, I tried using count_if with simple vector first and then I used RTree to organize my data and retrieve the needed information from RTree. My code does exactly the same thing as the code posted by Nikos Athanasiou. Can you please suggest some changes in the code to exclude the neighbors of previous points. e.g. In the output of the code neighborhood of point { 100, 150 } are { 101, 152 } { 102, 151 } now i don't want to compute neighbors for {101, 152} and {102, 151}. I would like to continue check for {105, 155}. – Prem Jul 01 '15 at 08:22
R-tree *is* a collection of algorihms: insert,search,delete etc. – Has QUIT--Anony-Mousse Jul 01 '15 at 17:56
@Anony-Mousse Following this logic also `std::vector` is a collection of algorithms: insert, delete etc. I am almost sure that one can write algorithms (e.g. search) that work as well on a R-tree as on a `std::vector` (e.g. by using iterators). Thus in this context I would refer to R-tree as a data structure rather than algorithms. Actually any container has to offer some way to insert/delete elements and this does not make it an algorithm. However, I am not a professional and I am always happy to get corrected when I am talking nonsense ;) – 463035818_is_not_an_ai Jul 02 '15 at 07:57
yes, these are also algorithms. Inserting into an vector usually are in O(n) or O(1), depending on the algorithm. It's fairly easy to make a worse insertion algorithm, too. – Has QUIT--Anony-Mousse Jul 02 '15 at 12:08