Deleted duplicate elements from large unsorted vector while retaining their order

Question

I need to remove duplicates from a large unsorted vector of size 8 million elements of the following form:

vector<unsigned> unsortedVec;

I need to retain the position of the unsorted elements after removing duplicates. For example, if my unsortedVec is:

6,98,1,938,98,736,0,1

Then after removing duplicates my unsorted vector should be:

6,98,1,938,736,0

i.e. only the later duplicate values are deleted and the order of the elements remain as it is.

For doing so I tried to create a set of unique elements present in the vector "unsortedVec" and continued iterating, if the elements were already present in the set then I did not insert them in "unsortedVec". Given the size of my unsortedVec, this is turning to be very slow. Is there some way by which I may delete duplicate elements from unsorted vec.

I tried the approach marked as duplicate question, however that too is turning to be very slow

Your example does not preserve the position (I understand: the index) of the remaining elements. For example, 736 is now at index 4 (0-based) while it was at index 5 originally. Do you mean "preserve the (un-)order"? And what would be hard about that? — Peter - Reinstate Monica, Aug 28 '15 at 16:51
@PeterSchneider Thanks for correcting, yes I mean preserving order. The hard thing about it is given the size of the vector, removing duplicates is turning to be very slow — Jannat Arora, Aug 28 '15 at 16:53
Ah. Well, this is not a complete answer, but typical other places where elements are "deleted" fast are file systems and memory allocators. They tend to *mark* elements as "deleted" which is fast, instead of moving data around. In your case you would either have a struct with the value plus a bool indicating "deleted" and put that in a single vector; or you would use two vectors, one with the values which is never touched after filling, and the second one being bool indicating which elements of the first one are deleted. — Peter - Reinstate Monica, Aug 28 '15 at 18:20

score 0 · Answer 1 · answered Aug 28 '15 at 16:51

When you delete element from vector all elements behind it should be moved. It will be faster to create another vector, reserve space there according size of first vector, and then copy all unique elements from your first vector (your approach for checking uniqueness is fine, just use unordered_set instead of set).

Deleted duplicate elements from large unsorted vector while retaining their order

1 Answers1