5

This question is related to existing Question: fast way to copy one vector into another

I have a vector source vector S and I want to create a destination vector D which has only those elements of S which satisfy a particular condition(say element is even). Note that source vector is constant vector.

I can think of two STL algorithms to do this :

  • copy_if
  • remove_if

In both methods, I will need to make sure the destination vector D is of big enough size. So, I will need to create initially vector D of the same size as S. Also, in both methods, I want to compact the vector D to be of the same length as the number of elements in it. I donot know which one of them is faster or more convenient but I dont know any better way to copy a vector conditionally ?

Community
  • 1
  • 1
nurabha
  • 1,152
  • 3
  • 18
  • 42
  • 5
    This a good and well-versed question. What puzzles me is why you didn't measure or research it? – Sebastian Mach Jul 22 '14 at 08:57
  • 3
    `In both methods, I will need to make sure the destination vector D is of big enough size. ` Not if you use something like `back_inserter`. – user657267 Jul 22 '14 at 09:03
  • @user657267: you mean like this - copy(S.begin(),S.end(),back_inserter(D), MyPredicate ). Wouldn't this be same as conditional push back ? – nurabha Jul 22 '14 at 09:06
  • What do you mean by `compact`: make sure `.size()` returns the exact number of elements, or make sure `.capacity()` and `.size()` return the same results ? Note that the latter is not achievable (in a Standard way) as an implementation can keep capacity slightly bigger than size (for example, because it knows the underlying memory allocator). – Matthieu M. Jul 22 '14 at 09:38
  • @MatthieuM.: Size should return exact number of elements. By compact I meant it should use as much memory to fit exact number elements. – nurabha Jul 22 '14 at 09:44

4 Answers4

7

The simplest way is:

auto const predicate = [](int const value) { return value % 2 == 0; };
std::copy_if(begin(src), end(src), back_inserter(dest), predicate);

which relies on push_back.

Now, indeed, this may trigger memory reallocation. However I'd like to underline that push_back has amortized constant complexity, meaning that in average it is O(1), which is achieved by having an exponential growth behavior (so that the number of allocations performed is O(log N)).

On the other hand, if you have 1 million elements, only 5 of which being even, it will not allocate 4MB of memory up-front, only to relinquish it for only 20 bytes later on.

Therefore:

  • it's optimal when the distribution is skewed toward odd numbers, because it does not over-allocate much
  • it's close to optimal otherwise, because it does not reallocate much

Even more interesting, if you have an idea of the distribution up-front, you can use resize and shrink_to_fit:

// 90% of the time, 30% of the numbers are even:
dest.reserve(src.size() * 3 / 10);

auto const predicate = [](int const value) { return value % 2 == 0; };
std::copy_if(begin(src), end(src), back_inserter(dest), predicate);

dest.shrink_to_fit();

This way:

  • if there were less than 30%, shrink_to_fit might trim the excess
  • if there were 30%, bingo
  • if there were more than 30%, re-allocations are triggered as necessary, still following that O(log N) pattern anyway

Personal experience tells me that the call to reserve is rarely (if ever) worth it, amortized constant complexity being really good at keeping costs down.

Note: shrink_to_fit is non-binding, there is no guaranteed way to get the capacity to be equal to the size, the implementation chooses what's best.

Community
  • 1
  • 1
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
5

Well, you could use back_inserter:

std::vector<int> foo = {...whatever...};
std::vector<int> bar;
std::back_insert_iterator< std::vector<int> > back_it (bar);

std::copy_if (foo.begin(), foo.end(), back_it, MyPredicate);

or count element:

std::vector<int> foo = {...whatever...};
int mycount = count_if (foo.begin(), foo.end(), MyPredicate);
std::vector<int> bar (mycount);

std::copy_if (foo.begin(), foo.end(), bar.begin(), MyPredicate );

A third solution:

std::vector<int> foo = {...whatever...};
std::vector<int> bar (foo.size());

auto it = std::copy_if (foo.begin(), foo.end(), bar.begin(), MyPredicate );
bar.resize(std::distance(bar.begin(),it));
Gall
  • 1,595
  • 1
  • 14
  • 22
  • Nice. In the first method there is overhead of memory re-allocation during insertion. In second method, we have double the number of load operations(not good) as we scan the vector twice but we save on overheads of memory re-allocation – nurabha Jul 22 '14 at 09:24
  • +1 Given "...compact the vector D to be of the same length as the number of elements in it", the first would need a `.shrink_to_fit()` or (for C++03) `.swap(std::vector())`, and the second approach is likely faster (unless an extra check of the predicate costs more than ~2 extra copies in memory). – Tony Delroy Jul 22 '14 at 09:28
  • If you have a guesstimate of possible output size, you can reserve the vector in the first method. – Neil Kirk Jul 22 '14 at 09:28
  • The first block as a bug in it, the back_insert_iterator should take bar as an argument, not foo – Mustafa Ozturk Nov 03 '16 at 14:36
  • @MustafaOzturk You're right, thanks for the comment! – Gall Nov 16 '16 at 08:27
1

copy_if and remove_if have different semantics. The former needs a separate destination vector for the matching items

copy_if(begin(src), end(src), back_inserter(dst), myPred());

whereas the latter removes the non-matching items but they still have to be erased ( the remove-erase idiom)

src.erase(remove_if(begin(src), end(src), std::not1(myPred()), end(src));

If you want to have a separate destination vector, you need to

remove_copy_if(begin(src), end(src), back_inserter(dst), std::not1(myPred()));

This should be equally expensive as copy_if. I would find it more confusing because of the double negative (remove if not, vs copy if).

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • Problem is that in both methods I will have to do an erase to compact the destination vector and in both cases I have to have a big enough destination vector. – nurabha Jul 22 '14 at 09:27
  • 1
    @nurabha: `erase` never compacts the vector (iterators before the point of erasure are guaranteed valid afterwards) - `shrink_to_fit` may with C++11 but isn't required to. – Tony Delroy Jul 22 '14 at 09:55
  • Ok, so erase just modifies the size without changing capacity and returns the new end iterator while invalidating all iterators after new end to the old end – nurabha Jul 22 '14 at 11:02
0

I would personally recommend using copy_if(). The nice thing about it is that it returns the output iterator at which it stopped copying. Here's an example for the even number case you mentioned:

vector<int> src;

// initialize to numbers 1 -> 10
for(int i = 0; i < 10; ++i) {
    src.push_back(i);
}

// set initial size to v1.size()
vector<int> dest(src.size());

// use copy_if
auto it = copy_if(src.begin(), src.end(), dest.begin(), [](int val){
    return val %  2 == 0;
});

dest.resize(dest.end() - it);

In this way, you will only need to resize once.

Nasser Al-Shawwa
  • 3,573
  • 17
  • 27
  • `dest.resize(dest.end() - it);` + "you will only need to resize once." - not true, as resizing can happen (probably ~`log2N` times) while `copy_if` populates `dest`, then the `resize` is *guaranteed* not to reduce `capacity()` (as it's specified in terms of `pop_back()` which never invalidates earlier iterators) - `shrink_to_fit` is a better bet, though may be a no-op in some implementations. – Tony Delroy Jul 22 '14 at 09:59
  • `copy_if` doesn't do any resizing. It relies on you to provide size guarantees, so resizing indeed happens only once. As for the actual capacity of the container, (vs the value returned by `vector::size ()` ) is not important (OP hasn't made that clear). I think suggesting shrink_to_fit is not advisable partly because of your reasoning – Nasser Al-Shawwa Jul 22 '14 at 10:54