6

Using STL algorithms (as much as possible) such as remove_if() and list::erase, is there a nice way to remove duplicates from a list defined as following:

list<int> l;

Please note that list::unique() only works if duplication occurs in consecutive elements. In my case, all duplicates have to be eliminated regardless of their position in the list. Moreover, removing duplicates mean preserving only one copy of each element in the final result.

EDIT: The option to l.sort() followed by l.unique() cannot be availed as that will destroy the order of the list.

Jaywalker
  • 3,079
  • 3
  • 28
  • 44
  • 4
    Well, obviously you could call `l.sort()` before calling `l.unique()`, but I assume there must be a reason why you can't do that? :) – hrnt Feb 03 '11 at 11:44
  • Not sure about STL algorithms, but the obvious way to do this is to iterate through the list building a hash set: if each element isn't in the set it is unique so add to set; if it is in the set it's a duplicate so remove from list. – Rup Feb 03 '11 at 11:45
  • Why don't you propose us some code of yours ? – Stephane Rolland Feb 03 '11 at 11:45
  • 1
    @hrnt I assume he wants to keep his current element order, notwithstanding duplicates – Rup Feb 03 '11 at 11:46
  • 1
    See http://stackoverflow.com/questions/4877504/how-can-i-remove-duplicate-values-from-a-list-in-c/4877575#4877575 – Fred Nurk Feb 03 '11 at 11:50

3 Answers3

8

If preserving the order of the list is not important, you can just do list.sort(); list.unique();

If the order is important, use Rup's suggestion:

list<int>::iterator iter = l.begin();
set<int> elements;
while (iter != l.end()) {
  if (elements.find(*iter) != elements.end())
    iter = l.erase(iter);
  else {
    elements.insert(*iter);
    ++iter;
  }
}
hrnt
  • 9,882
  • 2
  • 31
  • 38
  • 2
    otherwise: `if(elements.insert(*iter).second) ++iter else iter = l.erase(iter)`. set::insert returns a pair of which the second elements indicates whether insertion was successful, or failed because of a duplicate. – Benoit Feb 03 '11 at 11:55
  • aren't you testing the wrong `end()` on the line containing `find()`? – Hasturkun Feb 03 '11 at 12:16
  • @Hasturkun: yes I was. Fixed now :) – hrnt Feb 03 '11 at 12:30
  • Thanks for the response...but aren't loops very STL-"unlike"? – Jaywalker Feb 04 '11 at 10:37
  • @Jaywalker: Yeah, they are not very "STL-like". TheOm3ga's answer is more "STL-like" (with a few fixes). – hrnt Feb 04 '11 at 11:02
8

Using the list::remove_if member function, a temporary hashed set, and lambda expression.

std::list<int> l;
std::unordered_set<int> s;

l.remove_if([&](int n) {
    return (s.find(n) == s.end()) ? (s.insert(n), false) : true;
});
Blastfurnace
  • 18,411
  • 56
  • 55
  • 70
  • Note: This solution avoids the pitfall that I have noted on José Tomás Tocino's answer by having `s` be captured by reference . – M.M Oct 01 '14 at 01:09
6

He said he wanted to use the erase-remove idiom, so here's a possible way, using a function object:

struct Unifier{
    set<int> foundElements;

    bool operator()(int & a){
        if(foundElements.find(a) != foundElements.end()){
            return true;
        }else{
            foundElements.insert(a);
            return false;
        }
    }
};


int main(){
    list<int> v;

    v.push_back(5);
    v.push_back(4);
    v.push_back(5);
    v.push_back(3);
    v.push_back(5);
    v.push_back(3);

    copy (v.begin(), v.end(), ostream_iterator<int>(cout," "));

    Unifier u;
    v.remove_if(u);

    cout << endl << "After:" << endl;
    copy (v.begin(), v.end(), ostream_iterator<int>(cout," "));

}

Update: The above code has a subtle bug. According to C++11 [algorithms.general]/10:

[Note: Unless otherwise specified, algorithms that take function objects as arguments are permitted to copy those function objects freely. Programmers for whom object identity is important should consider using a wrapper class that points to a noncopied implementation object such as reference_wrapper<T> (20.8.3), or some equivalent solution. —end note ]

There appears to be no "otherwise specified" for std::list::remove_if, so this code may fail to remove all duplicates because it may create copies of the predicate at the start, and then use different copies of the predicate for different parts of the list. Example of this actually happening for std::remove_if.

A simple fix for C++11 is to replace v.remove_if(u) with:

v.remove_if( reference_wrapper<decltype(u)>(u) );

In C++03 I'm not sure if the above quote was present; but if it was then a fix would be to make foundElements be static, or to refactor Unifier so that all copies of it reference a single instance of foundElements.

Link to related question

Community
  • 1
  • 1
José Tomás Tocino
  • 9,873
  • 5
  • 44
  • 78
  • 1
    Why do you take the a parameter by reference? – Fred Nurk Feb 03 '11 at 12:06
  • 1
    You do not need to use erase-remove idiom with std::list. You could just call v.remove_if(u); Also, your foundElements does not need to be static. – hrnt Feb 03 '11 at 13:12
  • 2
    You actually don't want `foundElements` to be `static`. If you used the functor again, it would already have the set of entries from the previous run in it, which wouldn't give you correct results on future runs. – Zac Howland Feb 03 '11 at 13:19
  • Yep, that's what I thought at first, but it was giving me weird problems, so I used static in a hurry. But I agree it's supposed not to be necessary. – José Tomás Tocino Feb 03 '11 at 20:16
  • Please make these changes so that I can accept this as the most accurrate answer :) `v.remove_if (u)` should be used and `foundElements` need not be `static.` – Jaywalker Feb 04 '11 at 10:53