I assume that by best you mean fastest that works. Since it's a question about efficiency, I performed a simple benchmark to compare efficiency of several algorithms. Note that they differ a little, since the problem is a bit underspecified - the questions that arise (and assumptions taken for benchmark) are:
- is it guaranteed that
vDestination
contains all elements from vSource
? (assumption: no)
- are duplicates allowed in either
vDestination
or vSource
? (assumption: yes, in both)
- does the order of the elements in the result vector matter? (algorithms for both cases tested)
- should every element from
vDestination
be removed if it is equal to any element from vSource
, or only one-for-one? (assumption: yes, in both)
- are sizes of
vDestination
and vSource
somehow bounded? Is one of them always bigger or much bigger? (several cases tested)
- in the comments it's already explained that vectors don't need to be sorted, but I've included this point, as it's not immediately visible from the question (no sorting assumed in either of vectors)
as you see, there are a few points in which algorithms will differ and consequently, as you can guess, best algorithm will depend on your use case. Compared algorithms include:
- original one (proposed in question) - baseline
- proposed in @dkg answer
- proposed in @Revolver_Ocelot answer + additional sorting (required by the algorithm) and pre-reservation of space for result
vector
- proposed in @Jarod42 answer
- set-based algorithm (presented below - mostly optimization of @Jarod42 algorithm)
- counting algorithm (presended below)
set-based algorithm:
std::unordered_set<int> elems(vSource.begin(), vSource.end());
auto i = destination.begin();
auto target = destination.end();
while(i <= target) {
if(elems.count(*i) > 0)
std::swap(*i, *(--target));
else
i++;
}
destination.erase(target, destination.end());
counting algorithm:
std::unordered_map<int, int> counts;
counts.max_load_factor(0.3);
counts.reserve(destination.size());
for(auto v: destination) {
counts[v]++;
}
for(auto v: source) {
counts[v]--;
}
auto i = destination.begin();
for(auto k: counts) {
if(k.second < 1) continue;
i = std::fill_n(i, k.second, k.first);
}
destination.resize(std::distance(destination.begin(), i));
Benchmarking procedure was executed using Celero library and was the following:
- Generate
n
pseudo-random int
s (n
in set {10,100,1000,10000, 20000, 200000}
) and put them to a vector
- Copy a fraction (
m
) of these ints to second vector
(fractions from set {0.01, 0.1, 0.2, 0.4, 0.6, 0.8}
, min. 1 element)
- Start timer
- Execute removal procedure
- Stop timer
Only algorithms 3, 5 and 6 were executed on datasets larger than 10 000 elements as the rest of them took to long for me to comfortably measure (feel free to do it yourself).
Long story short: if your vectors contain less than 1000 elements, pick whichever you prefer. If they are longer - rely on size of vSource
. If it's less than 50% of vDestination
- choose set-based algorithm, if it's more - sort them and pick @Revolver_Ocelot's solution (they tie around 60%, with set-based being over 2x faster for vSource
being 1% size of vDestination
). Please don't rely on order or provide a vector that is sorted from the beginning - requirement that ordering has to remain same slows the process down dramatically. Benchmark on your use case, your compiler, your flags and your hardware. I've attached link to my benchmarks, in case you wanted to reproduce them.
Complete results (file vector-benchmarks.csv
) are available on GitHub together with benchmarking code (file tests/benchmarks/vectorRemoval.cpp
) here.
Please keep in mind that these are results that I've obtained on my computer, my compiler etc. - in your case they will differ (especially when it comes to point in which one algorithm is better than another).
I've used GCC 6.1.1 with -O3
on Fedora 24, on top of VirtualBox.