Remove duplicated members from vector while maintaining order

Question

I know this question has been asked a lot, but I could not find the best (the most efficient) way to remove duplicated members (type double) from vector while maintaining 1 copy and order of the original vector.

I have a vector of point (defined type) in which probably some points have been inserted twice during process. I want to delete/remove the copied points (only the second copy, I mean I want to have the first copy). and also I want to keep the order of original vector. — H'H, Oct 22 '13 at 10:19
@AliAlamiri: The OP didn't state if they had control over how and where the vector is set up. Otherwise, a set is at least semantically correct. — thokra, Oct 22 '13 at 10:30
@H'H that is what you want. But what have you tried? Did you try anything? Or did you just look at the problem, shrug, and ask here? — Yakk - Adam Nevraumont, Oct 22 '13 at 10:47
@Yakk for sure I tried it before asking, but since I have some constraints (e.g. keeping original vector order), still am not sure what is the best. I also need to work with the indices of the original vector members afterwords. — H'H, Oct 22 '13 at 10:57

score 3 · Accepted Answer · answered Oct 22 '13 at 12:35

If your data was not doubles, simply doing a pass with an unordered_set keeping track of what you have already seen via the remove_if-erase idiom would work.

doubles are, however, bad news when checking for equality: two derivations that you may think should produce the same value may generate different results. A set would allow looking for nearby values. Simply use equal_range with a plus minus epsilon instead of find to see if there is another value approximetally equal to yours earlier in the vector, and use the same remove erase idiom.

The remove erase idiom looks like:

vec.erase( std::remove_if( vec.begin(), vec.end(), [&](double x)->bool{
  // return true if you want to remove the double x
}, vec.end());

in C++03 it cannot be done inline.

The lambda above will be called for each element in order, like the body of a loop.

Grimm The Opiner · Answer 2 · 2013-10-22T10:46:00.153

If you have to/wish to use a vector* then it might be easiest to trap duplicates on insertion - if the point to be inserted is already there, bin it.

Another approach for a really large collection would be to do a sort and search for duplicates after every N insertions, where N is the perfect number of insertions to wait before doing a sort and search for duplicates. (Calculating N is left as an exercise for the reader.)

Your approach, and the value of N if relevant, depends on the number of elements, how often the array is changed, how often the contents are examined, and the likelihood of duplicates occuring.

(*apparently, vectors are great as their disadvantages lie where modern computers tend to kick butt so hard it doesn't matter, and are blisteringly fast with linear searches. At least I think that's what Bjarn's saying here comparing a vector to a linked list.)

Remove duplicated members from vector while maintaining order

2 Answers2