Removing duplicates from a non-sortable vector

Question

I'm looking for a way to remove duplicates from a vector (lets call him theGreatVector :D). I can't use std::sort followed by std::unique because there is no way to sort my objects.

theGreatVector contains some vector<Item*> (smallVectors)

I got an overload of == for vector<Item*> so i can use it

I'm able de create something in O(n²) but i need time efficiency (theGreatVector.size() could be 10⁵ or 10⁶)

Right now what i got is something like that (i fill my vector only if smallOne isnt in it) :

for(i=0;i<size;i++)
{
  vector<Item*>smallOne = FindFacets(i)
  if(smallOne doesnt belong to GreatOne) // this line already in O(n) :/
  {
     theGreatOne.push_back(smallOne);
  }
}

If there is a way to do that even in nlog(n) + n or anything lower than n², that'd be great !

Thanks a lot

Azh

If you have equality of values, it is likely that you can define some ordering too, and perform a sort. — juanchopanza, Apr 24 '13 at 09:46
what do you mean you can't sort your objects? you can always `std::tie` every data member into a `std::tuple` and use lexicographic ordering on that — TemplateRex, Apr 24 '13 at 09:46
What does your `==` do on `vector`? Does it compare `size` and pointer-values, or does it dereference the pointers and compare underlying value? Why do you think that `<` cannot work similarly, are `Item` strange in some way? By "duplicates" do you mean duplicate `vector`, or duplicate `Item*` in one of the `vector`, or duplicate `Item` in an `Item*` in one of the `vector` (I assume the first)? Is the order of `GreatOne` important? How often do you add to it? Read? Modify? In what pattern (lots of adds, then nothing but lots of reads?) — Yakk - Adam Nevraumont, Apr 24 '13 at 12:58
1_ == check if the 2 vectors got the same pointer in them, whatever their place is (for exemaple if i say that my items are (12 / 45 / 7 / 8) == return true even if its compared with (12 / 7 / 45 / 8) ) 2_i want to remove the vector duplicates , there is no duplicates in Item , Item are mesh nodes in a 3D mesh , with a lot of physical data in it(for example position,value of some unknwowns,link to adjacents nodes etc...) (if ur familiar with it i need to make some calculations on a morphological mesh). — Azhrilla, Apr 25 '13 at 07:21
3_ order is irrelevent in theGreatOne but relevent in smallOne. 4_ Great One is created & used once (lots of add / nothing / lots of read) 5_ i hope that its clearer 6_ excuse my poor english i'm not english & its early in the morning :) — Azhrilla, Apr 25 '13 at 07:22
did my answer help you? if not, please help me to improve it. — TemplateRex, May 01 '13 at 12:07
Yeah it helped a lot, i didnt use your solution but it gave me ideas on how to solve my problem. Finally i've use a lexicographic ordering to create an order between my objects, but i didnt use the tuples and stayed on vector. I've made an ordering between vector in fact. — Azhrilla, May 02 '13 at 11:53

TemplateRex · Accepted Answer · 2013-04-24T12:47:18.917

You can always std::tie every data member into a std::tuple and use lexicographic ordering on that to sort a vector of pointers to your big data structure. You can then do std::unique on that data structure before copying the output. With a small modification you could also remove the duplicates in place by sorting the big Item vector directly.

#include <tuple>
#include <memory>
#include <vector>

// tuples have builtin lexicographic ordering, 
// I'm assuming all your Item's data members also have operator<
bool operator<(Item const& lhs, Item const& rhs)
{
    return std::tie(lhs.first_data, /*...*/ lhs.last_data) < std::tie(rhs.first_data, /*...*/ rhs.last_Data);
}

int main()
{
   // In the Beginning, there was some data
   std::vector<Item> vec;
   // fill it

   // init helper vector with addresses of vec, complexity O(N)
   std::vector<Item*> pvec; 
   pvec.reserve(vec.size());
   std::transform(std::begin(vec), std::end(vec), std::back_inserter(pvec), std::addressof<Item>);

   // sort to put duplicates in adjecent positions, complexity O(N log N) 
   std::sort(std::begin(pvec), std::end(pvec), [](Item const* lhs, Item const* rhs){
       return *lhs < *rhs; // delegates to operator< for Item
   });       

   // remove duplicates, complexity O(N)
   auto it = std::unique(std::begin(pvec), std::end(pvec), [](Item const* lhs, Item const* rhs){
       return *lhs == *rhs; // assumes Item has operator==, if not use std::tuple::operator==
   });
   pvec.erase(it, std::end(pvec));

   // copy result, complexity O(N)
   std::vector<Item> result;
   result.reserve(pvec.size());
   std::transform(std::begin(pvec), std::end(pvec), std::back_inserter(result), [](Item const* pelem){
       return *pelem;
   });

   // And it was good, and done in O(N log N) complexity
}

Thanks a lot for this answer but it is quite hard for me to understand ^^ U propose to remove the smallvector & to use tuple instead right ? — Azhrilla, Apr 24 '13 at 11:13
@Azhrilla What I propose is that you use a `tuple` to do the comparison so that you can sort your vector. The `pvec` is just a helper data structure in case your `Item` is very expensive to swap. — TemplateRex, Apr 24 '13 at 11:53

Ilya Kobelevskiy · Answer 2 · 2013-04-24T09:58:02.173

0

Take a look at unordered set: http://www.cplusplus.com/reference/unordered_set/unordered_set/ it seems to do what you want. Insertions for single elements are done in O(1) on average, O(n) in worst case, only equality operator needs to be provided.

edited Apr 24 '13 at 09:58

answered Apr 24 '13 at 09:52

Ilya Kobelevskiy

5,245
4
24
41

Removing duplicates from a non-sortable vector

2 Answers2

Linked