11

I wanted to know if someone knew the implementation of the C++ string::erase function and it's complexity. I know the C++ string is an object of characters. I'm assuming it doesn't allocate and create a new object of characters and then copy over the characters from the old string O(n) and O(n) space. Does it shift over the characters O(n) and O(1) space? I've looked on cplusplus.com and Bjarne Stroustrup book but haven't been able to find the answer. Can someone point me to the source code where it's implemented or know the answer?

Thanks!

B Stoops
  • 111
  • 1
  • 3
  • 3
    Just know that it'll be better than any house made implementation. So as a user perspective it is O(AlwaysBetterThanYourImplementation) :) – Omid CompSCI Oct 18 '18 at 03:23
  • You could look in the standard: https://stackoverflow.com/a/4653479/14065 – Martin York Oct 18 '18 at 03:25
  • 2
    @OmidCompSCI that sounds like an expression of faith more than actual knowledge. Many algorithms are simple enough that the optimal solution is obvious, in which case most house-made implementations will end up being in the same O() class as the standard-library's solution. Almost certainly `std::string::erase()` is in that category (likely implemented in O(N) via `memmove()` or something equivalent) – Jeremy Friesner Oct 18 '18 at 03:34
  • 1
    The source code you want to read is on your own computer in the `string` header file or in some file included by it. – n. m. could be an AI Oct 18 '18 at 03:34
  • It does not specify the complexity. see `24.3.2.6.5 basic_string::erase [string.erase]` It specifies the `Effects`. It does not require that we don't re-allocate new space (as this would prevent copy on modification implementations). But it's fair to assume that unless you have a strange implementation it will not re-allocate (as the copy on modification attempts have been dropped). – Martin York Oct 18 '18 at 03:35
  • 1
    It's going to be O(n) in the worst-case regardless of if it shifts the elements back or allocates a new string. – Bilal Saleem Oct 18 '18 at 03:41
  • 1
    Thank you all for the responses. Thank you Martin for showing me the standards. Reading through the standards, it seems that it makes a copy of the substring from the starting index to pos index, (skipping the xlen characters) and then taking a copy of the ending characters pos + xlen to size(). Which would mean it would be O(n) and O(n) space unless I'm mistaken. – B Stoops Oct 18 '18 at 03:48
  • 1
    @BStoops: I don't see a requirement to make a copy. It can re-use the same space if it wants. It describes what the resultant string should look like if you made a copy but it does not require you to make that copy. – Martin York Oct 18 '18 at 03:51
  • @MartinYork: Copy-on-write has been disallowed since C++11. – Davis Herring Oct 18 '18 at 05:06
  • @JeremyFriesner: So be careful to say that it’s O(your code) but might not be o(your code). Full Landau notation for the win! – Davis Herring Oct 18 '18 at 05:15
  • @DavisHerring Was it? The last I heard was several implementations tried it and it was not very good so everybody abandoned the idea. It is also very unlike the standard to say how to (or how not too) implement something. Usually the standard defined behavior. But let's assume you are correct. Just replace "Copy on Write" with "Interesting implementation Y" – Martin York Oct 18 '18 at 05:20
  • @MartinYork: You remember correctly; one of the issues was thread-safety (and paying for it when unneeded). That implementation *is* precluded, *by* prescribing behavior: references [aren’t invalidated](http://eel.is/c++draft/basic.string#string.require-4.2) by `operator[]`, so there’s no way to notice a (potential) write so as to copy. – Davis Herring Oct 18 '18 at 13:43
  • 1
    @DavisHerring and MartinYork Thank you for your responses, very enlightening discussion. So it is my understanding that it is O(n) and makes a copy so O(n) space. – B Stoops Oct 19 '18 at 22:16
  • @DavisHerring A follow up would be does anyone know of a more efficient way to do this operation – B Stoops Oct 19 '18 at 22:21
  • @BStoops: Real implementations will copy only if the string is (or rather *becomes*) very small, so it doesn’t matter. And the O(n) *time* is only significant when erasing near the beginning of a long string. Those *do* sometimes matter, but you have to switch *data structures*, not algorithms: see the “rope” concept. – Davis Herring Oct 19 '18 at 23:44
  • @DavisHerring Thanks for the response. I'm intrigued to learn more about the "rope" concept, do you have a good resource for it? – B Stoops Oct 22 '18 at 17:41

1 Answers1

8

As stated in the comments, the standard doesn't specify any complexity requirements for many basic_string functions (including erase). This is partly because there have historically been a number of different implementation strategies (most famously, copy-on-write was popular in the C++98 era), so the committee was reluctant to specify anything too precisely.

The basic behavior of typical, modern implementations is very much like vector<char>: insertion and deletion are cheap at the end (with amortized reallocations) and expensive at the beginning. Small strings are handled without memory allocation at all (by reusing the pointers' space to store the characters). This means that erasure can result in copying the whole string if it becomes very short (but then the copy is cheap). It also means that iterators and references are invalidated more easily than with vector.

There aren't any algorithms that perform better on a string, but there are alternate data structures that have different performance characteristics. The rope is typical among these in that it provides somewhat slower access but much faster insertion and deletion.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • Great! Thank you for the breakdown and help! I would upvote but I don't have enough reputation yet. – B Stoops Oct 25 '18 at 21:03