7

As far as the algorithm is concerned, removing a set of element from a contiguous array can be done effectively in two parts.

  1. Move all the elements not to be deleted to the front of the array.
  2. Mark the array smaller.

This can be done in C++ with the erase-remove idiom.

vector<T> v; // v = {0,1,2,3,0,0,7};
vector<T>::iterator it = remove(v.begin(),v.end(),e);
// move all elements not to be deleted to the front
// Yes, remove is not the brightest name for that.
// Especially as list::remove really remove elements from the list.
// now v = {1,2,3,7,?,?,?} results marked in question marks
// are implementation dependent.
v.erase(it,v.end());
// get rid of the elements marked as question marks.
// v = {1,2,3,7}

Now, the content of the elements in the question mark is unknown. The only thing we can do with them is get rid of them (by overwriting them, or by removing them).

Is there a real world situation where you need to use remove without erase? The only situation I could think of is

copy(src.begin(),src.end(),remove(v.begin(),v.end(),e),v.end());

replacing all As with Bs, and requiring that all those new Bs will be contiguous. Doesn't make too much sense.

edit: Does it make sense to anything other than contigious memory container (deque and vector actually)?

If indeed I'm correct, why was it implemented as a standalone algorithm, instead of vector::remove_if, dequeue::remove_if etc.

Elazar Leibovich
  • 32,750
  • 33
  • 122
  • 169
  • @MatthieuM: Not really. They're *unspecified* actually. See this : http://stackoverflow.com/questions/6456870/stl-remove-doesnt-work-as-expected/6456967#6456967 – Nawaz Oct 06 '11 at 13:46

5 Answers5

6

Your question goes the wrong way round. Rather, one would ask "why should I implement this identical algorithm repetitively for each container, instead of making it a single, free function"?

The key idea behind separating containers, iterators and algorithms is that you don't have a complexity explosion as you write more algorithms and more containers. If you want to write a new container that has contiguous storage, you can use remove on it right out of the box (provided you supply forward or random-access iterators), without duplicating any code.

The only reason that certain containers implement their own, member versions of algorithms is because either they can do it better than the generic version (e.g. std::set::find vs. std::find; or list::remove), or because they can do what the generic version cannot do (e.g. std::list::sort vs. std::sort). But if you can, you should use the free version for maximal generality and genericity.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • *provided you supply random-access iterators* `remove()` doesn't require random access iterator. – GreenScape Oct 06 '11 at 14:00
  • 1
    I don't see how having a `list<>::remove()` function should imply deleting code anyways. It would be perfectly fine to have containers provide convenience methods such as `vector<>::for_each()` that are implemented in terms of `std::for_each()` and other standard algorithms unless the implementation can do better. It would certainly reduce a lot of typing for the common case. – André Caron Oct 06 '11 at 14:01
  • @Kerrek, I'm not opposed to having _Remove_if internally. But I don't see (1) who needs them other than vector-like container, (2) Why do your API leaves more work for the user? As Josh Bloch put it "Don't make the client do anything the library could do http://goo.gl/FjFAv". Someone will have to write the erase-remove idiom for vectors. And this someone better be the API once, and not the user 100 times. You can leave the `remove` functions for implementing vector-like structures, but add `vector::remove`! – Elazar Leibovich Oct 06 '11 at 14:01
  • @AndréCaron: I'm not sure I follow your objections to `list::remove()`. It is a rather unique function. It's different from, say, `set::erase()`, which erases by value, because the `list` version allows for an arbitrary comparator, while the `set` version uses the set's mandated structure. It's also different from the free algorithm, because it takes advantage of the list's structure to perform the removal more efficiently. So a member function is the only sensible way to take advantage of all this, and calling it `remove` isn't all that confusing. – Kerrek SB Oct 06 '11 at 14:14
  • @AndréCaron: Having a container member function `for_each` is a lot less flexible than the free algorithm, since you don't get to specify a *range*. At best, you could imply beginning to end. There's nothing fundamental *stopping* you from putting that in the library, but I suppose it was considered too arbitrary and specific, with nothing really lost by requiring you to use the free function, and a lot of uniformity gained. – Kerrek SB Oct 06 '11 at 14:16
  • @Kerrek, while I can agree on `for_each`, I can see very little reason to use `remove` outside of the `remove-erase` idiom, or outside of `vector`/`deque`. So I see no need for having that avalable separately. – Elazar Leibovich Oct 06 '11 at 14:22
  • @KerrekSB: I know it's less flexible, and I don't object to `list<>::remove()`. Read what I wrote again. I didn't say we should substitute the standard `for_each()` algorithm for a container member function. I said that these two things are orthogonal: you can have a generic `for_each()` function that requires a range *and* have a member function that requires less typing for the common case *without duplicating code*. Consider: `Op std::vector::for_each(Op op){return std::for_each(begin(), end(), op);}` – André Caron Oct 06 '11 at 14:24
4

First of all: actually implementing remove_if as member methods would have meant duplicating the code again and again. I always considered that the mistake was in list (why is there a remove: MarkB answered this, in C++03 it's more efficient, and in C++11 it's slightly more efficient)

This is the design goal of the STL: separate data structures and algorithms so that if you have N structures and M algorithms, you do not have N*M implementations of the algorithms, but only N+M which is definitely more manageable.

Notably, the specification of remove that the elements "left out" have unspecified value allows the new C++11 move operations to be applied for efficient removal.

Now, I'll admit that most of the times, it is used as part of the remove-erase idiom. No one prevents you from creating your own:

 template <typename C>
 void erase(C& container, typename C::const_reference r) {
   container.erase(std::remove(container.begin(), container.end(), r), container.end());
 }
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • `list::remove` exists because it can use its knowledge of the list to unlink the node(s). – Mark B Oct 06 '11 at 13:53
  • Indeed, `list<>::remove()` can remove items without copying over the other items. – André Caron Oct 06 '11 at 13:57
  • I don't see how having a `list<>::remove()` function should imply deleting code anyways. It would be perfectly fine to have containers provide convenience methods such as `vector<>::for_each()` that are implemented in terms of `std::for_each()` and other standard algorithms unless the implementation can do better. It would certainly reduce a lot of typing for the common case. – André Caron Oct 06 '11 at 14:00
  • @MarkB: hum, I get it now thanks. In C++03 you need copy assignment to transfer the data which is inefficient, so it makes sense in node based containers to move the nodes around instead. – Matthieu M. Oct 06 '11 at 14:00
  • 1
    See my comment above. "Don't make the client do anything the library could do goo.gl/FjFAv";. Someone will have to write the erase-remove idiom for vectors. And this someone better be the API once, and not the user 100 times. Why do you need the `remove` functions not for `vector`s or `deque`s anyhow? Putting them in the standard just for someone who happens to implement one seems a mistake to me. – Elazar Leibovich Oct 06 '11 at 14:03
  • @AndréCaron: I'd rather the STL algorithms were implemented in term of ranges instead of pairs of iterators (or at least, provided such an overload). This would save as much typing without creating monster interfaces like that of `std::string`. – Matthieu M. Oct 06 '11 at 15:28
  • @ElazarLeibovich: I guess it's a matter of philosophy then. I am all for minimalistic design and tooling, and I abhor monster interfaces and framework. Therefore the STL rather suits me (in general), though I still wish they had specified a number of their algorithms in terms of ranges instead of blunt iterators. I also much prefer free-functions to member functions: it favors encapsulation, reuse and uniformity. – Matthieu M. Oct 06 '11 at 15:32
  • @MatthieuM. No relation to philosophy. I also prefer minimalistic design. However, giving a function you must never use on its own, is a source for errors, and should never be done. `remove` should never be used without `erase`, so you must not expose it to your clients. The code for `erase` you wrote, must be implemented by each and every user or the `remove` function in the stl. So I see no reason not to expose only the correct function (nevermind if as member function or as another free function). (And I'm all for ranges BTW. same reason. having begin w/o end makes no sense - so avoid it!) – Elazar Leibovich Oct 06 '11 at 19:02
1

Yes. A simple counter-example:

void bar {
  // Get vector of T's from foo()
  vector<T> v = foo();
  // Print only non-zero elements.
  vector<T>::iterator it = remove(v.begin(),v.end(), T(0));
  // I could call erase here, but why bother?
  std::copy(v.begin(), it, std::ostream_iterator<T>(std::cout));
}

If I'm going to use only the selected part of the vector, and have no other uses for the vector, I don't need to erase the vector. All elements will be destructed anyway when the vector goes out of scope.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • You don't need to erase the vector, but erasing it is an excellent idea. Since the next programmer might use those unknown variables after your copy for some reason. The price of destructing those elements seems negligible in 99% of the cases, and having accessible objects in an unspecified state is a big price to pay for that. – Elazar Leibovich Oct 06 '11 at 14:44
  • 1
    And BTW, I don't see what are you gaining by not erasing those elements. You're doing identical work (in both cases all element would be destroyed), but in a more weird and error prone way. – Elazar Leibovich Oct 06 '11 at 18:58
0

If, for some reason, you don't want to re-size your container, but just remove some elements and later set some other values to replace the removed space, then you could just use remove.

As I understand, the value of removed elements are unspecified after removal, so you can't use their data. They are kind of removed, but the space isn't freed.

Juho
  • 983
  • 5
  • 9
  • You can safely erase elements from the vector without resizing it. http://stackoverflow.com/questions/5496692/how-to-remove-elements-from-stdvector-without-resizing-it – Elazar Leibovich Oct 06 '11 at 14:05
  • @ElazarLeibovich: it might still be much more efficient to copy-assign than to destruct an existing item and then copy-construct an item at the same place. – André Caron Oct 06 '11 at 14:29
  • @AndréCaron, I agree that it might be needed in some very rare cases, even so it's horrible style. But generally speaking having accessible objects which are in unknown state, just so that you wouldn't have to call their destructor, sounds like a bad idea. So I believe forcing the user to implement his own `remove` in those extra rare cases, is not too bad. – Elazar Leibovich Oct 06 '11 at 14:41
  • @ElazarLeibovich: I also agree that it's bad style, but it's consistent with the general "don't pay unless you want to" philosophy of C++ and its standard library. – André Caron Oct 06 '11 at 15:07
  • @AndréCaron, it doesn't contradict this philosophy. You can always implement it yourself in the super rare case that you'll need it. You don't have to pay. But there's no reason that all developers are stuck with inferior standard, for the 0.00001% (literally), that needs that. I really doubt that there's even a single real world case that could not use the erase-remove idiom due to performance. (If your tight on performance, you'll probably use something like strinx, more than something like STL.) So I really think this "issue" is bogus. – Elazar Leibovich Oct 06 '11 at 18:57
0

Yes. first of all remove() is container independent. You can use it on 2 iterators of any container and should not do things like:

if(<iterator is vector iterator>)
{
   vector::remove_if(...);
}
else(<iterator is dequeue iterator>)
{
   dequeue::remove_if(...);
}
else(<iterator is list iterator>)
{
   list::remove_if(...);
}
// etc

Yes there is reason to not perform erase(). For vector it's making it smaller and therefore performing memory reallocation. What's not always desired because that will cause a lot of copy constructors for other elements of vector.

GreenScape
  • 7,191
  • 2
  • 34
  • 64
  • You can use it on all containers, but I don't see any reason you'd want to use it on anything except `vector` or `deque`. See link below, erase will not resize your vector. http://stackoverflow.com/questions/5496692/how-to-remove-elements-from-stdvector-without-resizing-it – Elazar Leibovich Oct 06 '11 at 14:07
  • @Elazar Leibovich you are wrong, mister. http://www.cplusplus.com/reference/stl/vector/erase/ *This effectively reduces the vector size by the number of elements removed, calling each element's destructor before.* As long as data in vector is contiguous there is no way to destruct/construct object w/o memory freeing/allocation. Unless new/delete are used of course. I highly doubt about using last ones. – GreenScape Oct 06 '11 at 14:15
  • Umm.. I think it is possible to call ctor/dtor without memory allocation, with placement new/delete. – Elazar Leibovich Oct 06 '11 at 14:20
  • sorry. i meant to say *Unless placement new/delete are used of course*. I doubt they are used there. What about memory alignment? – GreenScape Oct 06 '11 at 14:24
  • @GreenScape: It reduces the *size* of the vector, not it's *capacity*. Elements are added to and removed from the end of the vector with placement new and direct destructor call in order to achieve amortized constant-time for `.push_back()`. This is required by the standard. – André Caron Oct 06 '11 at 14:31
  • @GreenScape: memory alignment is not an issue. The standard requires that a pointer to `T` is always suitably aligned for type `T` and `operator new()` always returns a pointer suitably aligned for any object. – André Caron Oct 06 '11 at 14:33
  • @GreenScape, Even without using placement new/delete. You can call the elements destructor directly. http://codepad.org/V0JrZLdm – Elazar Leibovich Oct 06 '11 at 14:37
  • @Elazar Leibovich sure thing. but to construct object you will end up using placement new. Actually there is no such thing as placement delete :D. Destructors are used instead. – GreenScape Oct 07 '11 at 07:44