25

I'm trying to create a generic function that removes duplicates from an std::vector. Since I don't want to create a function for each vector type, I want to make this a template function that can accept vectors of any type. Here is what I have:

//foo.h

Class Foo {

template<typename T>
static void RemoveVectorDuplicates(std::vector<T>& vectorToUpdate);

};

//foo.cpp

template<typename T>
void Foo::RemoveVectorDuplicates(std::vector<T>& vectorToUpdate) {
for(typename T::iterator sourceIter = vectorToUpdate.begin(); (sourceIter != vectorToUpdate.end() - 1); sourceIter++) {
        for(typename T::iterator compareIter = (vectorToUpdate.begin() + 1); compareIter != vectorToUpdate.end(); compareIter++) {
            if(sourceIter == compareIter) {
                vectorToUpdate.erase(compareIter);
            }
        }
    }
}

//SomeOtherClass.cpp

#include "foo.h"

...

void SomeOtherClass::SomeFunction(void) {
    std::vector<int> myVector;

    //fill vector with values

    Foo::RemoveVectorDuplicates(myVector);
}

I keep getting a linker error, but it compiles fine. Any ideas as to what I'm doing wrong?

UPDATE: Based on the answer given by Iraimbilanja, I went and rewrote the code. However, just in case someone wanted working code to do the RemoveDuplicates function, here it is:

//foo.h

Class Foo {

    template<typename T>
    static void RemoveVectorDuplicates(T& vectorToUpdate){
        for(typename T::iterator sourceIter = vectorToUpdate.begin(); sourceIter != vectorToUpdate.end(); sourceIter++) {
            for(typename T::iterator compareIter = (sourceIter + 1); compareIter != vectorToUpdate.end(); compareIter++) {
            if(*sourceIter == *compareIter) {
                compareIter = vectorToUpdate.erase(compareIter);
            }
        }
    }
};

Turns out that if I specify std::vector in the signature, the iterators don't work correctly. So I had to go with a more generic approach. Also, when erasing compareIter, the next iteration of the loop produces a pointer exception. The post decrement of compareIter on an erase takes care of that problem. I also fixed the bugs in the iterator compare and in the initialization of compareIter in the 2nd loop.

UPDATE 2:

I saw that this question got another up vote, so figured I'd update it with a better algorithm that uses some C++14 goodness. My previous one only worked if the type stored in the vector implemented operator== and it required a bunch of copies and unnecessary comparisons. And, in hindsight, there is no need to make it a member of a class. This new algorithm allows for a custom compare predicate, shrinks the compare space as duplicates are found and makes a significantly smaller number of copies. The name has been changed to erase_duplicates to better conform to STL algorithm naming conventions.

template<typename T>
static void erase_duplicates(T& containerToUpdate) 
{
    erase_duplicates(containerToUpdate, nullptr);
}

template<typename T>
static void erase_duplicates(T& containerToUpdate, 
  std::function<bool (typename T::value_type const&, typename T::value_type const&)> pred) 
{
    auto lastNonDuplicateIter = begin(containerToUpdate);
    auto firstDuplicateIter = end(containerToUpdate);
    while (lastNonDuplicateIter != firstDuplicateIter) {
        firstDuplicateIter = std::remove_if(lastNonDuplicateIter + 1, firstDuplicateIter, 
            [&lastNonDuplicateIter, &pred](auto const& compareItem){
            if (pred != nullptr) {
                return pred(*lastNonDuplicateIter, compareItem);
            }
            else {
                return *lastNonDuplicateIter == compareItem;
            }
        });
        ++lastNonDuplicateIter;
    }
    containerToUpdate.erase(firstDuplicateIter, end(containerToUpdate));
}
bsruth
  • 5,372
  • 6
  • 35
  • 44
  • 1
    Put the definition with the declaration, i.e., move the function body into the header. Someone will soon post something about export, but basically the compiler will not generate code until it sees the template instantiated with a type. Or something like that. – MSN Jan 28 '09 at 19:26
  • Your implementation is not safe, because you are modifying your container while keeping your iterators to use inside the for loop. – Ismael Jan 28 '09 at 22:04
  • Iterators are not required to be aware of modification to the container (for performance reasons), and if some modifications cause the container to release/allocate memory, the iterators may end pointing to invalid memory. – Ismael Jan 29 '09 at 02:54
  • @xhantt: vector::erase() promises to leave all iterators to earlier elements in the sequence intact, so provided it is called as vectorToUpdate.erase(compareIter--) you are safe. – j_random_hacker Jan 29 '09 at 04:36
  • I probably could have done compareIter = vectorToUpdate.erase(compareIter); As that is probably a little more straight forward. – bsruth Jan 29 '09 at 14:42
  • The main problem with the original code was that you should have been using `std::vector::iterator` instead of `T::iterator` The changes you made to your loops are good, but you should really have `compareIter = --vectorToUpdate.erase(compareIter);`, because of the following `compareIter++` in the for loop. – Sumudu Fernando Nov 24 '09 at 02:55

6 Answers6

34

Short Answer

Define the function in the header, preferably inside the class definition.

Long answer

Defining the template function inside the .cpp means it won't get #included into any translation units: it will only be available to the translation unit it's defined in.

Hence RemoveVectorDuplicates must be defined in the header, as this is the only way the compiler can text-substitute the template arguments, hence instantiating the template, producing an usable class.

There are two workarounds for this inconvenience

First, you can remove the #include "foo.h" from the .cpp and add another one, in the end of the header:

#include "foo.cpp"

This lets you organize your files consistently, but doesn't provide the usual advantages of separate compilation (smaller dependencies, faster and rarer compiles).

Second, you can just define the template function in the .cpp and explicitly instantiate it for all the types it'll be ever used with.

For example, this can go in the end of the .cpp to make the function usable with ints:

template void Foo::RemoveVectorDuplicates(std::vector<int>*);

However, this assumes you only use templates to save some typing, rather than to provide true genericity.

Community
  • 1
  • 1
  • Great answer that finally answers this (and related) questions on SO with something *besides* a workaround, so this can be applied to other scenarios. – notbad.jpeg Feb 02 '17 at 01:31
5

One alternative you have is to first std::sort() the vector, and then use the pre-existing std::unique() function to remove duplicates. The sort takes O(nlog n) time, and removing duplicates after that takes just O(n) time as all duplicates appear in a single block. Your current "all-vs-all" comparison algorithm takes O(n^2) time.

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • 1
    This is the best answer here in my own humble opinion. Straightforward, uses reliable and effective proven code and very elegant. @bsruth, you should simply call these two functions consecutively. You can wrap it up in an inline method and have it neat and tidy. – sprite Aug 12 '13 at 06:25
2

You can't implement a template function in a .cpp file. The complete implementation has to be visible anywhere it's instantiated.

Just define the function inside the class definition in the header. That's the usual way to implement template functions.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 1
    Not true. See http://stackoverflow.com/questions/115703/storing-c-template-function-definitions-in-a-cpp-file // although, support for this has varied over the lifetime of C++. The accepted answer describes the official approach: Tell your C++ compiler which instantiations to create. Some compilers may not need that. – Krazy Glew Jan 29 '13 at 01:52
1

I'll suggest to use a more "generic" approach, instead of passing a container just receive two iterators.

Something like It remove_duplicates(It first, It last), and will return an iterator, so you can call like remove: v.erase(remove_duplicates(v.begin(), v.end()), v.end()).

template <typename It>
It remove_duplicate(It first, It last)
{
  It current = first;
  while(current != last) {
    // Remove *current from [current+1,last)
    It next = current;
    ++next;
    last = std::remove(next, last, *current);
    current = next;
  }
  return last;
}
Ismael
  • 2,995
  • 29
  • 45
  • Could you expand on your answer to explain why it is not safe to modify the container while keeping the iterators? – bsruth Jan 28 '09 at 22:50
  • +1 for a solution that avoids O(n^2) copies (even if the running time is still O(n^2) overall). (Note that by contrast, each call to erase() in the OP's solution shifts *all* subsequent objects down one position.) – j_random_hacker Jan 30 '09 at 02:37
0

Unrelated to your problem (which has been explained already), why is this a static function rather than residing globally in a namespace? This would be somewhat C++-ier.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    It has to do with migrating from a legacy code base. Eventually, I plan on doing what you suggest. – bsruth Jan 28 '09 at 20:28
0

I don't think that code compiles....

vectorToUpdate.erase where std::vector* vectorToUpdate.... does anyone else notice there is a * where there should be a &? that code is definitely not being compiled. if you're going to use a pointer to vector you must use '->' instead of '.' I know this is actually a bit nit picky but it points out that the compiler doesn't even care about your code...

Hippiehunter
  • 967
  • 5
  • 11