5

The problem I am trying to solve is a following: I have a container of floating points (vector of double vectors):

std::vector<std::vector<double>> dv { {0.0, 0.0}, {1.0, 0.0}, {0.0, 1.0}, {1.0, 1.0} };

Then, assume I have a new point (double vector):

std::vector<double> v1 {0.0001, 1.0};

I want to check if a point v1 is present in dv container based on some tollerance. The distance between two vectors is calculated as Euclidean distance.

I have looked to the relevant questions and answers:

and was also trying to use std::find_if() without any success, as it accepts only unary predicate.

At the moment I came up with a temporary solution. First, I created a generic function to find the Euclidean distance between two vectors:

template <typename InputIt1, typename InputIt2>
double EuclideanDistance(InputIt1 beg1, InputIt1 end1, InputIt2 beg2) {
  double val = 0.0;
  while (beg1 != end1) {
    double dist = (*beg1++) - (*beg2++);
    val += dist*dist;
  }
  return val > 0.0? sqrt(val) : 0.0;
}

Second, I created check_if function to check if an element is presented in a container based on the tolerance (Epsilon):

template <typename Container, typename Element>
bool check_if(const Container& c, const Element& el,
              const double Epsilon = 0.001) {
  auto pos = c.begin();
  for (; pos != c.end(); ++pos) {
    if (EuclideanDistance(pos->begin(), pos->end(), el.begin()) < Epsilon) {
      return true;
    }
  }
  return false;
}

And then I can use my code in a context like this:

// Check if container contains v1 using check_if()
if (check_if(dv, v1)) {
  std::cout << "Using check_if() - Container contains v1\n";
}

So my questions are following:

  1. Is there in-house an STL algorithm to achieve the same goal? If no, how could I improve my code? For example, I'm not sure how could I use any distance function in check_in() instead of EuclideanDistance()?
  2. I wonder if AshleysBrain suggestion (https://stackoverflow.com/a/3451045/3737891) to use std::set instead of std::vector could make any difference having a container of floating points?
Community
  • 1
  • 1
Remis
  • 571
  • 4
  • 12
  • 1
    I would store point in `std::pair` or `std::array` - `std::vector` is an overkill for fixed size data, plus you need to validate that it always has 2 elements – Slava Jul 25 '16 at 18:21
  • 1
    Use a lambda that captures `Epsilon` and pass that lambda to `find_if` – Praetorian Jul 25 '16 at 18:22
  • 2
    Try again with `std::find_if` because I can't see any reason it wouldn't work. – juanchopanza Jul 25 '16 at 18:23
  • With lambda, something like: `std::find_if(dv.begin(), dv.end(), [&](const auto&inner) { return EuclideanDistance(inner.begin(), inner.end(), v1.begin()) < Epsilon; })` – Jarod42 Jul 25 '16 at 18:24
  • @Jarod42 it cannot be `v1.begin()` - OP stores point xy coordinates in that vector – Slava Jul 25 '16 at 18:25
  • @Slava: `EuclideanDistance` takes `iterator` for container of `double`. It would be clearer if OP used `std::vector`, so it would be `EuclideanDistance(pt, expected_pt) < Epsilon`. – Jarod42 Jul 25 '16 at 18:29
  • @Jarod42 if OP would use `Point` then `v1` would be an object of that type and you would not need `v1.begin()` either. – Slava Jul 25 '16 at 18:30
  • @Slava: `v1`, as `dv[x]`, are `vector` to be passed to `EuclideanDistance` with their iterators. – Jarod42 Jul 25 '16 at 18:33
  • @Jarod42 I see, my mistake – Slava Jul 25 '16 at 18:34
  • Maybe use boost geometry library: http://www.boost.org/doc/libs/1_61_0/libs/geometry/doc/html/index.html – matiu Jul 25 '16 at 18:39
  • @Slava In my case points can be any dimensionality vectors. Only during the execution, I will get these points from the external program and my goal is to store them in some good container and later to check if the new points are already in this container. This allows me to avoid lots of expensive procedures. In my case, efficiency really matters, and in general, I can have thousands of ~50-dimensional points (vectors), thus the right container and the algorithm can have a huge impact on the performance. – Remis Jul 25 '16 at 20:56
  • @Remis I see, but your algo should not work on points with different dimentions at the same time, correct? Then I would make template method on `int N` and use `std::array` - that will guarantee points of the same dimentions. With `vector` you have to make sure that dimentions are the same, otherwise for example `EuclideanDistance` will crash – Slava Jul 26 '16 at 01:12

3 Answers3

12

and was also trying to use std::find_if() without any success, as it accepts only unary predicate.

That's all you want anyway. The single argument is going to be the elements of the vector you're looking in.

std::vector<std::vector<double>> dv{
    {0.0, 0.0}, {1.0, 0.0}, {0.0, 1.0}, {1.0, 1.0}};

std::vector<double> v1 {0.0001, 1.0};

auto find_result =
    std::find_if(begin(dv), end(dv), [&v1](std::vector<double> const &dve) {
      assert(dve.size() == v1.size());
      return EuclideanDistance(begin(dve), end(dve), begin(v1)) < 0.001;
    });

On another note, if your points are all a fixed dimension then using vector<double> probably is not the right representation. A tuple<double,double>, or an array<double 2>, or a custom class that has overloaded operators would probably all be appropriate replacements.

bames53
  • 86,085
  • 15
  • 179
  • 244
  • In my case, all points are n-dimensional vectors. However, only during execution, I will know the exact dimensions (n) of these points, i.e., for the different runs dimensionality of these points can differ. However, during the same run, all points have the same dimensionality. Thus, my goal is to choose the right container to store these points and later to use, an efficient algorithm to check if a new point is already in this container. – Remis Jul 25 '16 at 20:40
2

Is there in-house an STL algorithm to achieve the same goal? If no, how could I improve my code? For example, I'm not sure how could I use any distance function in check_in() instead of EuclideanDistance()?

Non exactly STL (if I'm not wrong), but...

As suggested by others, instead of a std::vector<double>, for points coordinates, you should consider std::tuple<double, double, ...> or, if they are 2D points (your case, if I'm not wrong), std::pair<double, double>.

But there is another tamplated class that I suggest you (but only for 2D points, not for 3D points): std::complex<T>

Your dv could be

std::vector<std::complex<double>> dv { {0.0, 0.0}, {1.0, 0.0}, {0.0, 1.0}, {1.0, 1.0} };

The new point

std::complex<double>  v1 {0.0001, 1.0};

Another suggestion: avoid the square root (it's costly to compute); check the square of the distance against the square of Epsilon; using std::norm of the difference of a couple of complex numbers you get exactly (if I'm not wrong) the square of the distance

Using std::complex<double>, the example from bames53 become simply

#include <vector>
#include <complex>
#include <iostream>
#include <algorithm>

int main()
 {
   std::vector<std::complex<double>> dv
    { {0.0, 0.0}, {1.0, 0.0}, {0.0, 1.0}, {1.0, 1.0} };

   std::complex<double> v1 {0.0001, 1.0};

   auto find_result =
      std::find_if(begin(dv), end(dv), [&v1](std::complex<double> const &dve)
                   { return std::norm(v1-dve) < 0.000001; });

   std::cout << *find_result << std::endl;  // print (0,1)

   return 0;
 }

--- EDIT ---

I wonder if AshleysBrain suggestion (https://stackoverflow.com/a/3451045/3737891) to use std::set instead of std::vector could make any difference having a container of floating points?

I don't think that the "floating point" component is important in choosing between std::vector and std::set.

By example, if (for your application) it's important to mantain the order of insertion of points, you should use std::vector (or std::queue); if you have to do a lot of searches, I think is better std::set (or std::multiset if you use point with molteplicity`).

If you decide to use std::complex<double>, I give you another suggestion: give a look at std::valarray.

I don't know what exactly do you want to do with your points but... I suppose can be useful.

p.s.: sorry for my bad English

Community
  • 1
  • 1
max66
  • 65,235
  • 10
  • 71
  • 111
1

From your comment ("In my case points can be any dimensionality vectors...") I understand that the suggestion to use std::complex<double> for point coordinates (or std::pair<double, double>, or std::tuple<double, double, ...>) is inapplicable.

So i suggest to use, instead of std::vector<double>, std::array<double, N> where N is the dimension of the point.

I mantain my suggestion to avoid square root and check square of distance against square of Epsilon.

My third suggestion is to use std::inner_product() in the following way (example with N == 3)

#include <vector>
#include <array>
#include <numeric>
#include <iostream>
#include <algorithm>

int main()
 {
   std::vector<std::array<double, 3>> dv
    { {{0.0, 0.0, 0.0}}, {{1.0, 0.0, 0.0}},
      {{0.0, 1.0, 0.0}}, {{0.0, 0.0, 1.0}},
      {{1.0, 1.0, 0.0}}, {{1.0, 0.0, 1.0}},
      {{0.0, 1.0, 1.0}}, {{1.0, 1.0, 1.0}} };

   std::array<double, 3> v1 { {0.0001, 1.0, -0.0001} };

   auto result = std::find_if(dv.cbegin(), dv.cend(),
           [&v1](std::array<double, 3> const & dve)
            { return std::inner_product(v1.cbegin(), v1.cend(),
                 dve.cbegin(), 0.0, std::plus<double>(),
                 [](double d1, double d2){ auto d = d1-d2; return d*d; })
                 < 0.00001; });

   std::cout << "result: ";

   for ( const auto & d : *result )
      std::cout << '(' << d << ')';

   std::cout << std::endl;

   return 0;
 }

The output is

result: (0)(1)(0)

Sorry again for my English.

max66
  • 65,235
  • 10
  • 71
  • 111