5

I'm looking for a way to remove duplicates from a vector (lets call him theGreatVector :D). I can't use std::sort followed by std::unique because there is no way to sort my objects.

theGreatVector contains some vector<Item*> (smallVectors)

I got an overload of == for vector<Item*> so i can use it

I'm able de create something in O(n²) but i need time efficiency (theGreatVector.size() could be 10⁵ or 10⁶)

Right now what i got is something like that (i fill my vector only if smallOne isnt in it) :

for(i=0;i<size;i++)
{
  vector<Item*>smallOne = FindFacets(i)
  if(smallOne doesnt belong to GreatOne) // this line already in O(n) :/
  {
     theGreatOne.push_back(smallOne);
  }
}

If there is a way to do that even in nlog(n) + n or anything lower than n², that'd be great !

Thanks a lot

Azh

Azhrilla
  • 73
  • 4
  • If you have equality of values, it is likely that you can define some ordering too, and perform a sort. – juanchopanza Apr 24 '13 at 09:46
  • what do you mean you can't sort your objects? you can always `std::tie` every data member into a `std::tuple` and use lexicographic ordering on that – TemplateRex Apr 24 '13 at 09:46
  • What does your `==` do on `vector`? Does it compare `size` and pointer-values, or does it dereference the pointers and compare underlying value? Why do you think that `<` cannot work similarly, are `Item` strange in some way? By "duplicates" do you mean duplicate `vector`, or duplicate `Item*` in one of the `vector`, or duplicate `Item` in an `Item*` in one of the `vector` (I assume the first)? Is the order of `GreatOne` important? How often do you add to it? Read? Modify? In what pattern (lots of adds, then nothing but lots of reads?) – Yakk - Adam Nevraumont Apr 24 '13 at 12:58
  • 1_ == check if the 2 vectors got the same pointer in them, whatever their place is (for exemaple if i say that my items are (12 / 45 / 7 / 8) == return true even if its compared with (12 / 7 / 45 / 8) ) 2_i want to remove the vector duplicates , there is no duplicates in Item , Item are mesh nodes in a 3D mesh , with a lot of physical data in it(for example position,value of some unknwowns,link to adjacents nodes etc...) (if ur familiar with it i need to make some calculations on a morphological mesh). – Azhrilla Apr 25 '13 at 07:21
  • 3_ order is irrelevent in theGreatOne but relevent in smallOne. 4_ Great One is created & used once (lots of add / nothing / lots of read) 5_ i hope that its clearer 6_ excuse my poor english i'm not english & its early in the morning :) – Azhrilla Apr 25 '13 at 07:22
  • did my answer help you? if not, please help me to improve it. – TemplateRex May 01 '13 at 12:07
  • Yeah it helped a lot, i didnt use your solution but it gave me ideas on how to solve my problem. Finally i've use a lexicographic ordering to create an order between my objects, but i didnt use the tuples and stayed on vector. I've made an ordering between vector in fact. – Azhrilla May 02 '13 at 11:53
  • good to know you got something that works for you! – TemplateRex May 02 '13 at 12:07

2 Answers2

0

You can always std::tie every data member into a std::tuple and use lexicographic ordering on that to sort a vector of pointers to your big data structure. You can then do std::unique on that data structure before copying the output. With a small modification you could also remove the duplicates in place by sorting the big Item vector directly.

#include <tuple>
#include <memory>
#include <vector>

// tuples have builtin lexicographic ordering, 
// I'm assuming all your Item's data members also have operator<
bool operator<(Item const& lhs, Item const& rhs)
{
    return std::tie(lhs.first_data, /*...*/ lhs.last_data) < std::tie(rhs.first_data, /*...*/ rhs.last_Data);
}

int main()
{
   // In the Beginning, there was some data
   std::vector<Item> vec;
   // fill it

   // init helper vector with addresses of vec, complexity O(N)
   std::vector<Item*> pvec; 
   pvec.reserve(vec.size());
   std::transform(std::begin(vec), std::end(vec), std::back_inserter(pvec), std::addressof<Item>);

   // sort to put duplicates in adjecent positions, complexity O(N log N) 
   std::sort(std::begin(pvec), std::end(pvec), [](Item const* lhs, Item const* rhs){
       return *lhs < *rhs; // delegates to operator< for Item
   });       

   // remove duplicates, complexity O(N)
   auto it = std::unique(std::begin(pvec), std::end(pvec), [](Item const* lhs, Item const* rhs){
       return *lhs == *rhs; // assumes Item has operator==, if not use std::tuple::operator==
   });
   pvec.erase(it, std::end(pvec));

   // copy result, complexity O(N)
   std::vector<Item> result;
   result.reserve(pvec.size());
   std::transform(std::begin(pvec), std::end(pvec), std::back_inserter(result), [](Item const* pelem){
       return *pelem;
   });

   // And it was good, and done in O(N log N) complexity
}
TemplateRex
  • 69,038
  • 19
  • 164
  • 304
0

Take a look at unordered set: http://www.cplusplus.com/reference/unordered_set/unordered_set/ it seems to do what you want. Insertions for single elements are done in O(1) on average, O(n) in worst case, only equality operator needs to be provided.

Ilya Kobelevskiy
  • 5,245
  • 4
  • 24
  • 41