I just noticed that, according to the most recent C++ ISO standard, the string::pop_back
and string::erase
(to name two, there are probably others) member functions have unspecified complexity. What is the reasoning behind leaving the complexity a choice of the library coders? Are there actually any implementations of which anyone knows that have non-constant complexity for string::pop_back
?

- 362,284
- 104
- 897
- 1,065

- 1,520
- 2
- 15
- 33
-
1String interning could make `pop_back` O(N). http://en.wikipedia.org/wiki/String_interning . Not sure why `erase` is unspecified. – luiscubal Oct 06 '13 at 21:55
-
But C++11 implementations can't use string interning, since `s1.data() != s2.data()` when `&s1 != &s2`. – aschepler Oct 06 '13 at 22:08
-
1Anything modifying the size of the string may potentially cause its underlying storage to be moved, depending on how much has been changed. At least some versions of glibc use SSO (small string optimization) internally. While typically SSO would only be during the initialization of the string -- allowing the string to be "upgraded" to dynamic storage, there may be situations in which a string's content becomes significantly smaller than its allocated space, allowing moving to a smaller buffer. – Mark Nunberg Oct 06 '13 at 22:33
-
1Operations which change the string size may invoke the allocator, and the allocator's complexity is unspecified (since it is not within the library's control in general). – Raymond Chen Oct 06 '13 at 22:33
-
Just read it or try it. If you find out that your library's implementation is not O(1) then you have a very good reason to start shopping. – Hans Passant Oct 06 '13 at 22:36
-
@Raymond Chen, isn't the same true when changing a vector's size? `vector::pop_back` has a defined constant complexity. – user904963 Oct 06 '13 at 22:42
-
`vector::pop_back` is not allowed to reallocate. "Iterators, pointers and references referring to other elements that have not been removed are guaranteed to keep referring to the same elements they were referring to before the call." – Raymond Chen Oct 06 '13 at 22:54
2 Answers
TLDR; Because history, and that time complexities haven't been proposed yet
The situation:
[basic.string] only directly specifies that a few operations have a certain complexity:
size()
: O(1) since C++11max_size()
: O(1) since C++11shrink_to_fit()
: O(n) in C++17 (although added in C++11)operator[]
: O(1) since C++11swap()
: O(1)data()
: O(1) since C++11
It indirectly specifies many more (and I'm probably missing a few here):
length()
: equal tosize()
empty()
: equal tosize()
at()
: equal tooperator[]
front()
: equal tooperator[]
back()
: equal tooperator[]
copy()
: O(n) (comes from its character traits)assign()
: O(n) (comes from its character traits)find()
and variations : O(n) (comes from its character traits)append(char* s)
: O(m) wherem
is the length ofs
(comes from its character traits)
The requirement on data()
here is key. Before C++11, the underlying storage for a string was implementation-defined. It could have been a linked list for all that matters, so long as it could be translated into a c-style array when needed. After C++11, the underlying implementation was still platform dependent, but the requirement that data()
be O(1) all but guaranteed that storage of the string was in a contiguous C-style array (no more lazily copying your linked list). In C++17, data
was overloaded to return a non-const character array that you could modify, and such modifications would be the same as using operator[]
to do the same, which further solidified the implementation (FWIW the storage is still implementation-dependent, but there's no way to satisfy the time complexity requirements otherwise).
You can see that C++03's only performance requirement was on swap
, and this reflects the philosophy of the standard for a long time; prefer to specify only the behavior of an object and let the implementations take care of performance guarantees. This gives library implementers more flexibility to optimize for whatever they saw fit.
What's happened historically
When you dig into the proposals that added the complexity wordings (like n2923: Specifying the complexity of size()) you'll see that these complexity requirements are getting added piecemeal as people consider them.
Heck, the non-const data()
was added in C++17 simply because it wasn't proposed before (link to std discussion on the matter (really just links back to an answer our friend Howard Hinnant posted on StackOverflow))
Copy-On-Write
Int n8215, the author discusses basic_string
in detail, citing copy-on-write as one of the initial philosophies behind its design (although it didn't quite get there)
On the other hand, the current specifications of basic_string are intended to allow reference counted strings for which copy-on-write (COW) is essential.
(Wikipedia, citing Scott Meyers agrees).
If the standard was going to allow copy-on-write, it made sense to not make performance guarantees in the standard because writes to a string could either be O(1) or O(N). O(N) would be correct, I suppose, but it's a loose bound that could be misleading.
In fact, Daniel Krügler wondered the same thing as you in LWG 876: basic_string access operations should give stronger guarantees and also cited copy-on-write as a vestige:
Due to earlier support for copy-on-write, we find the following unnecessary limitations for C++0x:
... 2. Missing complexity guarantees:data()
andc_str()
simply return a pointer to their guts, which is guaranteed O(1). This should be spelled out.
So it looks like it's a mixture of allowing implementors flexibility, a discarded idea of supporting copy-on-write strings, and just a lack of people thinking about it.
Feel free to propose time complexities for basic_string functions where they are missing. The standard may be better off for it :-)

- 39,700
- 8
- 109
- 143
A long time ago, there was the idea that you could implement std::string
using ropes. The SGI STL even contained a rope
template with a very string
-like interface.
But it turned out that people used the subscript operator and the c_str()
method a lot, more than efficient concatenation, which is why rope-based implementations of std::string
were not competitive in practice.

- 32,022
- 3
- 48
- 92
-
Really interesting stuff. I'd love to see some of the standards discussions or proposals on the matter if you can find any. – AndyG Jul 23 '17 at 16:55
-
1Maybe [From Mechanism to Method: Distinctly Qualified](http://collaboration.cmc.ec.gc.ca/science/rpn/biblio/ddj/Website/articles/CUJ/2001/cexp1905/henney/henney.htm) is informative in this regard. But doesn't really discuss committee decision details; I expect that for middle-age C++, there is no detailed written history. – Florian Weimer Jul 23 '17 at 17:10