1

A given string needs to be modified so that there are no periods. The rest of the string remains intact.

I thought of using an iterative approach and using string::erase. In java, I could have easily accomplished it with replaceAll function. Thinking along those lines a developer published a solution :

        temp.erase(remove(temp.begin(),temp.end(),'.'),temp.end());

The line of code works perfectly. However I failed to understand how it works. As far as my understanding goes the erase function deletes all characters in a range. The remove function is returning the iterator to the point after removing the periods, and then the erase function is erasing from that point. I'm confused. Any ideas?

InferTech
  • 15
  • 6
  • 1
    Your understanding is correct. [Erase-Remove idiom](https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom). – Eljay Jul 30 '19 at 18:26
  • related/dupe: https://stackoverflow.com/questions/347441/erasing-elements-from-a-vector – NathanOliver Jul 30 '19 at 18:27
  • "Any ideas" about what, exactly? What is your question? – Drew Dormann Jul 30 '19 at 18:27
  • Possible duplicate of [Does C++ standard library provide more compact and generalized version of the erase–remove idiom?](https://stackoverflow.com/questions/56859485/does-c-standard-library-provide-more-compact-and-generalized-version-of-the-er) – L. F. Jul 31 '19 at 00:54

2 Answers2

3

It's easy to get confused but the remove function is not removing all the periods.

Suppose before remove your string was "a.b.c." then after remove it would be "abc...". What the erase method is doing is erasing the periods that are now at the end of the string.

Now it's a reasonable question to ask why remove behaves this way, but think about what remove has, it only has a pair of iterators. Iterators cannot be used to remove items from a container, an iterator does not give you access to the container that it refers to. So remove (and other similar functions) cannot delete anything from a container. All they can do is rearrange the order so that some later code can easily do the actual deleting.

There's a name for this idiom but it escapes me at the moment.

john
  • 85,011
  • 4
  • 57
  • 81
  • oh so the erase is deleting the last three dots after remove is altering the string? Great! That explains! – InferTech Jul 30 '19 at 18:30
  • 1
    After `remove` the string would most likely be "abc.c." since it moves good elements to the beginning. It's more like a move-left-if than a partition operation. – Blastfurnace Jul 30 '19 at 18:34
  • @Blastfurnace Yes good point. Formerly I think the tail portion of the container is unspecified. – john Jul 30 '19 at 18:36
  • 2
    @InferTech You got it, this is the [erase-remove idiom](https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom) – john Jul 30 '19 at 18:36
  • The erase/remove idiom is also very efficient. It's just a swap for each match. Erasing elements one at a time is slow because all the elements after each one have to be moved down for each item erased. So even w/o the iterator issue, this would be a poor approach. – doug Jul 30 '19 at 19:33
  • @doug, it's not exactly as "swap" but rather a "move to end" which is essentially a `memmov` of the rest of the string (or container) from the from the position after current to current while placing the current at the end. – David C. Rankin Jul 30 '19 at 20:20
  • @DavidC.Rankin, duh, you're right. That suggests the erase/remove idiom isn't all that efficient if there are lots of matches. It may be be faster to do a conditional copy. For instance, to a preallocated vector. I've been coding as if the erase/remove idiom was pretty optimal. For vectors even with move capability that aren't POD this could be much slower than a selected copy. Thanks for bringing this to my attention. – doug Jul 30 '19 at 21:52
  • @doug it's as efficient as it can be. You really have 2-options. Either `memmov` the rest of the string forward overwriting the character and moving the character to the end, or build a temporary container without character keeping count and put that many at the end. I haven't checked the libc++ source, but either would be fine. – David C. Rankin Jul 31 '19 at 02:50
  • @DavidC.Rankin, I'd bet that `std::copy_if` into another, size initialized, container would be faster. It should eliminate a lot of unnecessary moves in sets with a reasonable percentage of matches. I haven't benchmarked it but will consider this the next time I run across the need. – doug Jul 31 '19 at 03:54
  • I bet you are right, I was just thinking what the most basic underlying call would be. That's why I mentioned I had not looked at the source yet. Whether it is a `std::copy_if` or a `while (strchr (str.c_str(), c)` both would probably optimize to essentially the same assembly code. – David C. Rankin Jul 31 '19 at 04:10
  • @DavidC.Rankin Nothing is "moved to the end" when calling [`std::remove`](https://en.cppreference.com/w/cpp/algorithm/remove). It's not a partition operation. Each good element to be kept is moved to the left at most once. The elements after the new logical end have unspecified values. – Blastfurnace Aug 01 '19 at 04:22
  • @Blastfurnace thank you. As noted multiple times I did not consult libstdc++. But question, if you have `A.B.C.` and call `std::remove` and each good element to be kept is move to the left at most *once*, how do you end up with `ABC...`? If each is moved to the left only once, are you not left with `AB.C..`? – David C. Rankin Aug 01 '19 at 06:24
  • @DavidC.Rankin I should have been clearer. Each element is only moved a single time but it can jump multiple places to the left in that single move. If you start with `A.B.C.` the result should be `ABC.C.`, the `C` moved two slots to the left but was only **touched** once. That's why the algorithm is so efficient. – Blastfurnace Aug 01 '19 at 06:42
  • @Blastfurnace Ok, thank you, that lets all the gears fit together upstairs nicely. – David C. Rankin Aug 01 '19 at 06:43
0

std::remove acts on a range. A range is a pair of iterators denoting its begin and end, respectively.

std::string::erase, or any containers erase, acts on a string (/container).

Removing an element from a range can be thought of as returning a new range. For example you remove an element from a range by placing it at the end and the new range is from begin till end-1.

Erasing an element from a container is something different. The container owns the element and is responsible for actually "erasing" the element.

In summary: You start will all elements in the range made up by temp.begin() till temp.end() passed to std::remove which returns you the end of the new range (after shuffling the elements) and finally std::string::erase shrinks the string to contain only that new range of characters.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185