5

I am new to c++ and stuck to a problem. I am using list for storing string values. now i want to remove the duplicate values from that string. Can anyone tell me how do this.

Any sample code will be highly appreciate.

user229044
  • 232,980
  • 40
  • 330
  • 338
shekhar
  • 59
  • 1
  • 1
  • 2

5 Answers5

16

Use sort followed by unique.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
  • 1
    LOL! I need to read the reference more myself apparently. I didn't even know that was there. – Edward Strange Feb 02 '11 at 17:16
  • 1
    @Noah: to be honest, I realized the existence of these member functions by googling the global `std::unique` from ``. – Alexandre C. Feb 02 '11 at 17:17
  • @Noah: the question is, when do you really use a `list` ;) ? – Matthieu M. Feb 02 '11 at 17:44
  • 2
    @MatthieuM.: When you need to slice and splice, or need insertion/removal anywhere without invalidating any other iterators. :) – Fred Nurk Feb 02 '11 at 18:31
  • @Fred: this was rhetorical actually ;) I think the iterator invalidation is the strongest point anyway, but then I usually work with small (~5 ~20) datasets so I can splice vectors as easily. – Matthieu M. Feb 03 '11 at 07:14
  • @MatthieuM.: I figured it was, but people so often forget about list splicing without any invalidation that it's fun to point out sometimes. – Fred Nurk Feb 03 '11 at 09:39
  • @Fred: Actually, I think it's the only example of "move" semantics in C++03 :) – Matthieu M. Feb 03 '11 at 09:46
14

If the list is sorted, use its unique method.

If the list isn't sorted (and you don't want to sort it):

set<string> found;
for (list<string>::iterator x = the_list.begin(); x != the_list.end();) {
  if (!found.insert(*x).second) {
    x = the_list.erase(x);
  }
  else {
    ++x;
  }
}

To avoid copying the strings into the set:

struct less {
  template<class T>
  bool operator()(T &a, T &b) {
    return std::less<T>()(a, b);
  }
};
struct deref_less {
  template<class T>
  bool operator()(T a, T b) {
    return less()(*a, *b);
  }
};

void remove_unsorted_dupes(list<string> &the_list) {
  set<list<string>::iterator, deref_less> found;
  for (list<string>::iterator x = the_list.begin(); x != the_list.end();) {
    if (!found.insert(x).second) {
      x = the_list.erase(x);
    }
    else {
      ++x;
    }
  }
}
Fred Nurk
  • 13,952
  • 4
  • 37
  • 63
  • 3
    +1 for an example that doesn't destroy the existing order – bdonlan Feb 02 '11 at 17:25
  • Is there a common name for this `deref_less` comparator? I've been using one myself and had to think how to name it :S – Roman L Feb 02 '11 at 17:39
  • 1
    @7vies: Deref_less makes sense to me: deref then (std::)less. You could (and probably should) write it to actually use std::less: `return std::less(*a, *b);`. – Fred Nurk Feb 02 '11 at 17:49
  • Interesting use of decltype, gotta remember that one! – Xeo Feb 02 '11 at 18:10
  • @Xeo: It's the entire purpose of decltype. :) But it's 0x-only. In portable C++03, I'd write `struct my_less { template operator()(T &a, T &b) { return std::less()(a, b); } }; struct deref_less { template bool operator()(T a, T b) { return my_less()(*a, *b); } };` to avoid compiler-specific pre-0x equivalents to decltype. (I also missed a set of parens in the comment.) – Fred Nurk Feb 02 '11 at 18:32
8

If you have an std::list you can remove duplicates with:

yourlist.sort();
yourlist.unique();
ltjax
  • 15,837
  • 3
  • 39
  • 62
4

Use unique().

But first sort() the list, or unique won't do what you expect.

tpdi
  • 34,554
  • 11
  • 80
  • 120
0

Solution 1:

struct already_found
{
  std::set<std::string> & theSet;

  bool operator()(const std::string& s) const
  {
     return !theSet.insert(s).second;
  }
};

std::set<std::string> theSet;
the_list.remove_if( the_list.begin(), the_list.end(), already_found(theSet) );

Solution 2 using shared_ptr

struct already_found
{
  boost::shared_ptr<std::set<std::string> > theSet;
  already_found() : theSet( new boost::shared_ptr<std::set<std::string> > )
  {
  }

  bool operator()(const std::string& s) const
  {
     return !theSet->insert(s).second;
  }
};

the_list.remove_if( the_list.begin(), the_list.end(), already_found(theSet) );

These both have the disadvantage of having to copy all the strings. You can slightly optimise this by storing pointers to the strings and comparing them with a custom compare.

CashCow
  • 30,981
  • 5
  • 61
  • 92
  • Already_found::op() is const, but it modifies theSet. – Fred Nurk Feb 02 '11 at 18:34
  • 1
    @Fred Nurk It's allowed to modify it. In one case it's a ref. In the other it's in a shared_ptr. The const does not propagate to it in either case. – CashCow Feb 03 '11 at 00:25