2

What is the best way to verify if there are common members within multiple vectors? The vectors aren't necessarily of equal size and they may contain custom data (such as structures containing two integers that represent a 2D coordinate).

For example:

vec1 = {(1,2); (3,1); (2,2)};
vec2 = {(3,4); (1,2)};

How to verify that both vectors have a common member?

Note that I am trying to avoid inneficient methods such as going through all elements and check for equal data.

Pedro Batista
  • 1,100
  • 1
  • 13
  • 25
  • You'll need to go thru all elements (or notice their addition into the vectors). You could keep a `std::set` or `std::unordered_set` of elements in addition of the vectors. – Basile Starynkevitch Jan 28 '14 at 16:11
  • How do you expect to find a match if you are going through all elements and checking for equality? You have to check for answers, you can build a "guessing" algorithm, but that only works if you know the structure beforehand. As they are vectors you can _probably_ iterate to find a close spot, not sure if that is more efficient however. – Emz Jan 28 '14 at 16:12

3 Answers3

2

For non-trivial data sets, the most efficient method is probably to sort both vectors, and then use std::set_intersection function defined in , like follows:

#include <vector>
#include <algorithm>


using namespace std;

typedef vector<pair<int, int>> tPointVector;

tPointVector vec1 {{1,2}, {3,1}, {2,2}};
tPointVector vec2 {{3,4}, {1,2}};

std::sort(begin(vec1), end(vec1));
std::sort(begin(vec2), end(vec2));

tPointVector vec3;
vec3.reserve(std::min(vec1.size(), vec2.size()));
set_intersection(begin(vec1), end(vec1), begin(vec2), end(vec2), back_inserter(vec3));

You may get better performance with a nonstandard algorithm if you do not need to know which elements are different, but only the number of common elements, because then you can avoid having to create new copies of the common elements.
In any case, it seems to me that starting by sorting both containers will give you the best performance for data sets with more than a few dozen elements.

Here's an attempt at writing an algorithm that just gives you the count of matching elements (untested):

auto it1 = begin(vec1);
auto it2 = begin(vec2);
const auto end1 = end(vec1);
const auto end2 = end(vec2);
sort(it1, end1);
sort(it2, end2);

size_t numCommonElements = 0;
while (it1 != end1 && it2 != end2) {
    bool oneIsSmaller = *it1 < *it2;
    if (oneIsSmaller) {
        it1 = lower_bound(it1, end1, *it2);
    } else {
        bool twoIsSmaller = *it2 < *it1;
        if (twoIsSmaller) {
            it2 = lower_bound(it2, end2, *it1);
        } else {
            // none of the elements is smaller than the other
            // so it's a match
            ++it1;
            ++it2;
            ++numCommonElements;
        }
    }
}
Martin J.
  • 5,028
  • 4
  • 24
  • 41
  • Thanks for the useful answers, i learn alot in this website. I had to go another way to solve my problem actually, but these were great answers. – Pedro Batista Jan 29 '14 at 11:04
1

Note that I am trying to avoid inneficient methods such as going through all elements and check for equal data.

You need to go through all elements at least once, I assume you're implying you don't want to check every combinations. Indeed you don't want to do :
for all elements in vec1, go through the entire vec2 to check if the element is here. This won't be efficient if your vectors have a big number of elements.

If you prefer a linear time solution and you don't mind using extra memory here is what you can do :

You need a hashing function to insert element in an unordered_map or unordered_set See https://stackoverflow.com/a/13486174/2502814

// next_permutation example
#include <iostream>     // std::cout
#include <unordered_set> // std::unordered_set
#include <vector>       // std::vector

using namespace std;


namespace std {
  template <>
  struct hash<pair<int, int>>
  {
    typedef pair<int, int>      argument_type;
    typedef std::size_t  result_type;

    result_type operator()(const pair<int, int> & t) const
    {
       std::hash<int> int_hash;
       return int_hash(t.first + 6495227 * t.second);
    }
  };
}


int main () {
  vector<pair<int, int>> vec1 {{1,2}, {3,1}, {2,2}};
  vector<pair<int, int>> vec2 {{3,4}, {1,2}};


  // Copy all elements from vec2 into an unordered_set
  unordered_set<pair<int, int>> in_vec2;
  in_vec2.insert(vec2.begin(),vec2.end());

  // Traverse vec1 and check if elements are here
  for (auto& e : vec1)
  {
    if(in_vec2.find(e) != in_vec2.end()) // Searching in an unordered_set is faster than going through all elements of vec2 when vec2 is big.
    {
      //Here are the elements in common:
      cout << "{" << e.first << "," <<  e.second << "} is in common!" << endl;
    }

  }

  return 0;
}

Output : {1,2} is in common!

You can either do that, or copy all elements of vec1 into an unordered_set, and then traverse vec2. Depending on the sizes of vec1 and vec2, one solution might be faster than the other. Keep in mind that picking the smaller vector to insert in the unordered_set also means you will use less extra memory.

Community
  • 1
  • 1
Vincent
  • 444
  • 4
  • 6
  • Thanks for the useful answers, i learn alot in this website. I had to go another way to solve my problem actually, but these were great answers. – Pedro Batista Jan 29 '14 at 11:03
0

I believe you use a 2D tree to search in 2 dimenstions. An optimal algorithm to the problem you specified would fall under the class of geometric algorithms. Maybe this link is of use to you: http://www.cs.princeton.edu/courses/archive/fall05/cos226/lectures/geosearch.pdf .

aneez
  • 1
  • 1