0

I am looking for an efficient way to obtain the difference between two std::vectors. The objects they contain don't have any natural ordering; the best I can do is test for equality. This seems to rule out the std::set_difference option.

Is there a better solution than to compare the objects inside two nested iterators?

Matt Dunn
  • 5,106
  • 6
  • 31
  • 55
  • How big are those vectors? – Andy Prowl Mar 22 '13 at 10:56
  • Worst case scenario, there could be up to 7000 elements in each. – Matt Dunn Mar 22 '13 at 10:57
  • 3
    You don’t need a *natural* ordering as long as you can impose an artificial, internally consistent ordering. It doesn’t have to make sense, just satisfy the criteria for a strict weak ordering. – Konrad Rudolph Mar 22 '13 at 10:58
  • Okay, so I could just order them by some arbitrary parameter then? – Matt Dunn Mar 22 '13 at 10:59
  • 2
    @Dunnie: There is an overload of `set_difference` that accepts a predicate as its last argument. Just provide a predicate that establishes a strict weak ordering. For instance, by using `operator <` on some member variable of those objects. – Andy Prowl Mar 22 '13 at 11:02

2 Answers2

0

The goal is to reduce the number of equality test, by grouping possible positive matches together.

The best I can think of is to build a hashmap with the first vector, and then substract all the elements of the second vector from the hashmap. This means that you need to come up with a decent hash function for your elements.

Trivially, if you are storing pointers, and that your equality predicate is based on it, you could also base your hash on that pointer integral value.

See What is a good hash function?.

As mentioned in the comments, an alternate possibility is to establish an ordering on certain attributes of the elements, and use that to reduce the possibilities of equality. You may have to sort both vectors first.

Community
  • 1
  • 1
didierc
  • 14,572
  • 3
  • 32
  • 52
  • my answer is very language agnostic. I cannot tell which solution would yeld better performances. – didierc Mar 22 '13 at 11:49
0

You can take advantage of a property of vectors, they have contiguous memory allocation, that means, vector have a one big memory space reserved (bigger then used, normally), and without fancy optimizations, you can compare directly memory spaces.

In C++0x, you have data() method, that give you direct access of the beginning of that memory space. And you can use memcmp to compare all data inside vectors.

std::vector<char> vector_1;
std::vector<char> vector_2;

// data in to vector_1
// data in to vector_2

if(!memcmp(vector1.data(),vector_2.data(),SIZE_OF_BUFFER_TO_COMPARE))
{
    std::cout <<< "equal vectors" << std::endl;
} 
JP Cordova
  • 119
  • 9
  • Rather pointless: `std::equal_range` works just as well for `std::vector`, but also works for `std::list`. More importantly, you miss the point. The 2 vectors may have their elements in different order. – MSalters Mar 22 '13 at 23:24