Null terminated string, is it really dictated by the standard?

Question

Discussion

It is known that from C++11 and beyond std::basic_strings are considered to have null character terminated internal storage buffers.
The main reason for this change, among others, was that the previous definition of std::basic_string allowed only very limited concurrent access to strings and thus, limited performance for multi-threaded applications. (More on the reasons for the changes in std::basic_string can be read in the proposal N2534).
However, reading the standard I couldn't find a quote where explicitly is stated that std::basic_string must have a null character terminated internal storage buffer.
The only implicit quote that I've found is §21.4.7.1/1&3 basic_string accessors [string.accessors]:

const charT* c_str() const noexcept;

const charT* data() const noexcept;

1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()]. 3 Requires: The program shall not alter any of the values stored in the character array.

I assume that due to efficiency reasons and since §21.4.7.1/3 require that the program shall not alter the returned buffer, most implementers in std::basic_string::c_str() and std::basic_string::data() are returning the null character terminated internal buffer.
However, the standard doesn't state anywhere that the buffer that must be returned by std::basic_string::c_str() and std::basic_string::data() must be the internal storage buffer of the std::basic_string and not some null character terminated copy.

Questions:

Is there somewhere in the standard explicitly stated that std::basic_string internal storage buffer must be null character terminated?
In case there is not an explicit statement (i.e., question #1 short answer is no), does this mean that an implementer could implement the std::basic_string with out a null character terminated internal storage buffer and consequently the wide spread notion that since C++11 strings are null terminated is wrong?

Also I vaguely recall there's at least one `std::string` operation where you can explicitly dereference but not advance the `end` iterator. — Mooing Duck, Jul 31 '14 at 20:52
Reading the answers of the duplicate, I'm up to the conclusion that there's not an explicit directive for null terminated buffer but this is implicitly forced. Or am I wrong? — 101010, Jul 31 '14 at 21:07

Igor Tandetnik · Answer 1 · 2014-07-31T20:57:04.233

5

From 21.4.5:

const_reference operator[](size_type pos) const; reference operator[](size_type pos);

1 Requires: pos <= size().

2 Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

Note that s[s.size()] is well-defined and required to return NUL character.

However, that doesn't by itself require NUL-terimnated internal storage. 21.4.1/5 has this to say:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

Note that contiguous storage is only required up to n < s.size(), but not for s.size() itself. So char* p = &s[0]; doesn't necessarily point to a NUL-terminated buffer, as the standard doesn't require that p[s.size()] be valid.

Practically speaking, between requirements on data(), c_str() and operator[], any sane implementation would maintain a NUL-terminated storage. But insane implementations don't appear to be precluded by the standard.

edited Jul 31 '14 at 20:57

answered Jul 31 '14 at 20:51

Igor Tandetnik

50,461
4
56
85

2

But that quote does not require the internal storage itself to be null-terminated, only that it requires `[]` to return a reference to internal storage if `< size()` is requested, and a reference to a null character otherwise. That does not require the null to come from internal storage, it could come from an internal constant literal instead (which is what `c_str()` typically returns a pointer to when `size()` is 0). In this quote, there is nothing to require `s[s.size()-1]` and `s[s.size()]` to return references consecutive data in memory. – Remy Lebeau Jul 31 '14 at 20:53
@RemyLebeau: Quite. I was actually in the process of writing just this in the second half. – Igor Tandetnik Jul 31 '14 at 20:57
It just occured to me that this is a tricky requirement any way you cut it, because many `string` implementations allocate no memory for default construction. Since `std::string{}[0]` is required to return a readable reference, so the implementation must special case the `end` no matter what. – Mooing Duck Jul 31 '14 at 21:00
If `s.size()` is 0, `s[0]` returns a reference to a null character, and so `p[0]` is valid. If `s.size()` is >0, `s[0]` returns a reference to the beginning of internal storage, and `p[s.size()]` is valid only if the internal storage is null-terminated. – Remy Lebeau Jul 31 '14 at 21:03

score 1 · Answer 2 · edited Jul 31 '14 at 21:59

1

In C++11, c_str() and data() are both required to return a pointer to the internal storage (sorry, I don't have a direct quote handy to back that up). In earlier versions, c_str() was not require to do that, but data() was. Implementors could (but rarely did) implement copy-on-write semantics so that the pointer returned by c_str() was not the original internal storage.

In all versions, c_str() is required to return a null-terminated pointer. So, that implies that in C++11, the internal storage must be null-terminated.

edited Jul 31 '14 at 21:59

Greg Hewgill

951,095
183
1,149
1,285

answered Jul 31 '14 at 20:52

Remy Lebeau

555,201
31
458
770

There's nothing preventing `c_str()` from putting in the null value right before it returns (however, there must already be space for it). – Mooing Duck Jul 31 '14 at 20:54
data() existed in C++03, What version didn't it exist in? – Nevin Jul 31 '14 at 20:54
@Nevin oh, so it was. I thought it was new to C++11, but I was mistaken. – Mooing Duck Jul 31 '14 at 20:55
1

@MooingDuck, you may have been thinking of vector, which didn't get it until C++11. – Nevin Jul 31 '14 at 21:02
@MooingDuck: If the internal storage is large enough, c_str() could insert a null at the end before returning a pointer to the storage. Otherwise, it would have to make a copy of the storage data and then return a pointer to that copy. In earlier versions, that was allowed. In C++11, the data is not allowed to be relocated (ie, copied), which thus implies that internal storage must be large enough to always hold a null, even if it is not actually inserted until `c_str()` is called. – Remy Lebeau Jul 31 '14 at 21:10

Null terminated string, is it really dictated by the standard?

Discussion

Questions:

2 Answers2