5

Some excerpts from the standard first:

Spec for string::operator[]():

const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
Requires: pos <= size().
Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.
Complexity: constant time.

Spec for string::c_str() and string::data():

const charT* c_str() const noexcept;
const charT* data() const noexcept;
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
Complexity: constant time.

By combining these 2 specs, we can see that the pointer p returned by c_str()/data() must satisfy that p[0...size()-1] designates the elements of the string, and p[size()] equals to operator[](size()), which designates an object with the value charT(). Since additive operator on a pointer is used to get the address of the string elements and the last charT() object, they must be in a single array (you can't have just the string elements in place and create a charT() object on-the-fly and return it). So is it safe to say that in C++11, str::string is guaranteed to have the terminating null character in place in its underlying storage?


EDIT: My apology for failing to notice my question is a duplicate one. However, the answers I'm getting here seem to be conflicting with the answer in that duplicated question.

Also, I should have weakened my assumption like this: The underlying representation of std::string should reserve enough space to hold the terminating null character, and take its freedom to set the charT() value until c_str()/data() gets called. (i.e. The underlying storage must be able to hold at least size()+1 elements at any time.)

goodbyeera
  • 3,169
  • 2
  • 18
  • 39

3 Answers3

2

No it isn't. std::string can lazy-compute the required result for c_str() and data(). There's no guarantee that the null terminator is present in the underlying storage until those methods are called.

The standard only guarantees the contiguity of the string characters.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 1
    The standard required the c_str and data functions to operate in constant time though, which as far as a can tell precludes an implementation from copying the data as the time would then be O(N) on the length of the string. I guess it could just add the null terminator on demand but that would seem unlikely. – jcoder Mar 11 '14 at 14:29
  • @jcoder It would not preclude a copy on a theoretical system where a memory copy of any length is constant time, or a system where, say, virtual address mappings could be made such that a copy was not required. Also this answer does not require a copy, for example: setting a 0 at the end of whatever `c_str()` and `data()` return on the fly. – Jason C Mar 11 '14 at 14:33
  • On the contrary: I'd have thought it *very* likely that all an STL would do is to append a null terminator and return the address of the start of the data storage. Bear in mind that any subsequent modification of the string invalidates the output of `c_str()` that seems entirely sensible. Small string optimisations give you the only real headache as, to be strict, the O(1) rule applies to that case too. – Bathsheba Mar 11 '14 at 14:34
  • @Bathsheba, modifications invalidate the result of `c_str()`, i.e., they could be anything. This leaves the implementation free to do as JasonC says. Or just to notice that your `string` comes from asigning `"Hello world!"` to it, and returning another (compile time) copy of that. – vonbrand Mar 11 '14 at 14:51
  • @JasonC: `setting a 0 at the end of whatever c_str() and data() return on the fly`, so is it true for a weakened assumption that the underlying storage of std::string must be able to hold at least size()+1 elements at any time so that it can set the charT() value when a call to c_str()/data() comes? – goodbyeera Mar 11 '14 at 16:13
  • @Bathsheba: Maybe something irrelevant to the question here, it seems that the limitation of the return value of c_str()/data() as specified in C++03 like "`The program shall not alter any of the values stored in the array. Nor shall the program treat the returned value as a valid pointer value after any subsequent call to a non-const member function of the class basic_string that designates the same object as this.`" seems to be gone in C++11. Only the `The program shall not alter any of the values stored in the character array.` part remains. What's that about? – goodbyeera Mar 11 '14 at 16:18
  • @goodbyeera It would reasonably be true for that weak assumption. Note also that `string` has a concept of `capacity()`. The thing is, most of your assumptions are *probably* true if it is assumed that the underlying implementation is reasonable and efficient; and that probably *is* true in all widely-used implementations. But just because something is generally reasonable doesn't mean that it's guaranteed to be so by the standard, which is what your question was. – Jason C Mar 11 '14 at 16:31
  • 1
    @JasonC: Yeah, I know `capacity()` is defined as `The size of the allocated storage in the string` and only `capacity() >= size()` is guaranteed without any call to `reserve()`. But consider the requirement for `operator[]()` and `c_str()/data()`, it would be at least unwise if not impractical for an implementation to allocate a `capacity()` just as large as `size()`. – goodbyeera Mar 11 '14 at 16:38
  • @Bathsheba: Your assumption of `c_str()`'s validity is not correct in C++11. C++11 removes the particular invalidation rule of `c_str()/data()` (invalidates after call to any non-const member function). Now only the general invalidation rule of `std::string` applies: "as an argument to any standard library function taking a reference to non-const basic_string as an argument." or "Calling non-const member functions, **except operator[], at, front, back, begin, rbegin, end, and rend**." So in C++11, the array returned by `s.c_str()` won't get invalidated after something like `s[i] = c;`. – neverhoodboy Mar 12 '14 at 04:41
  • @JasocC right.. although I very much doubt such a thing exists in practical use – jcoder Mar 13 '14 at 13:20
1

So is it safe to say that in C++11, std::string is guaranteed to have the terminating null character in place in its underlying storage?

I don't see how you can reach this conclusion from the two quotes. In particular, I don't see anything in those quotes that would require the underlying implementation to maintain a terminating null character until you call c_str() or data().

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • But `c_str()` and `data()` has a O(1) complexity requirement. – goodbyeera Mar 11 '14 at 14:38
  • OK, I see. I should've weakened my assumption like this: In order to achieve O(1) complexity, at least the underlying representation should reserve enough space to hold the terminating null character, and take its freedom to set the charT() value until c_str()/data() gets called. How do you think of this assumption? (The underlying storage must be able to hold at least size()+1 elements at any time.) – goodbyeera Mar 11 '14 at 15:11
1

No... It's safe to say exactly what it says: That operator[](size()) == CharT() and that c_str()[size()] == 0 and that data()[size()] == 0.

Whether or not the "underlying storage" has a 0 in there is implementation defined, and also should have no effect on your program as there is no way to access it except through things like operator[] and c_str(), which are well-defined at size().

Jason C
  • 38,729
  • 14
  • 126
  • 182
  • `data()[size()]` amounts to inspecting the underlying storage. – R. Martinho Fernandes Mar 11 '14 at 14:35
  • But additive operation on pointer means that those objects has to be in the same array. – goodbyeera Mar 11 '14 at 14:39
  • 1
    @R.MartinhoFernandes It probably does. I could conceive of all sorts of silly, unlikely implementations *allowable by the standard* where it wouldn't; for example perhaps a `string` implementation maintains two parallel arrays of data in different forms and considers one to be "underlying storage" and the other to be a publicly accessible C-style (0-terminated) "backup". ... – Jason C Mar 11 '14 at 16:34
  • ... Like I mentioned in another comment: The thing is, most of the assumptions here are *probably* true if it is assumed that the underlying implementation is reasonable and efficient; and that probably *is* true in all widely-used implementations. But just because something is generally reasonable doesn't mean that it's required to be so by the standard, which is what the question was. – Jason C Mar 11 '14 at 16:35
  • Nope, you cannot. Note that the pointer returned by `data` and the references returned by `op[]` must match *all the way to the terminator* ("Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()]."). Whatever storage is provided by these functions, it must be the same throughout the entire interface. Note also that `operator[]` is now required to return an actual reference to `value_type` and cannot return a proxy (it used to be specified to return `Allocator::reference`, but now it must return `value_type&`). – R. Martinho Fernandes Mar 11 '14 at 16:37
  • @R.MartinhoFernandes That doesn't contradict what I just said. By "publicly accessible" I did mean both through `data()` and `operator []`. If a `string` had a parallel "internal" array and a publicly accessibly C-style array, the latter could certainly be returned both from `data()` and `operator[]`. Perhaps a clarification of what you mean by "underlying storage" is in order. – Jason C Mar 11 '14 at 16:40
  • You can conceive of implementations that store a thousand additional things, but they won't be exposed by the interface, and won't have any relevance. The publicly accessible C-style array is where the master copy of the data has to be because `s[i] = c;` works. – R. Martinho Fernandes Mar 11 '14 at 16:42
  • @R.MartinhoFernandes That's close to my point. All of those weird implementations would be allowable by the standard *and* the "underlying storage" would not be guaranteed to have a 0 terminator. The fact that `s[i] = c` works doesn't imply those arrays are the same; only that `operator[]` returns a ref into the publicly accessible array; an `operator[]` implementation could, say, flag the "underlying storage" for resync with the C array. But I think there is a general disagreement of what "underlying storage" means here (purely under-the-hood vs. the same thing returned by `data()`). – Jason C Mar 11 '14 at 16:47
  • "The fact that `s[i] = c` works doesn't imply those arrays are the same" Sure, it does not, but your additional array has no use! It might as well have `"foobar"` in it all the time. No one cares, because it's completely invisible. If the two disagree about the contents, it's the public copy that determines the content. In the "resync" case you describe, the additional is merely an invisible backup of the master copy, which is the publicly exposed one. I am ready to dismiss any interpretation of "underlying storage" that means "the invisible backup" instead of "the master copy" as ridiculous. – R. Martinho Fernandes Mar 12 '14 at 09:44
  • 1
    @R.MartinhoFernandes Well, of course it is ridiculous, but the topic here is picking apart the semantics of standard definitions. – Jason C Mar 12 '14 at 17:05
  • @JasonC I don't want to be rude, but it seems to me that you're not aware of the new constraints for `data()` and `c_str()` since C++11. Their constraints are having an effect on how `operator[]` is allowed to work. See the discussion to http://stackoverflow.com/a/38506560/3537677 But I see now, this whole discussion here might be a misunderstanding, due to the vague term " underlying storage" in the question. – Superlokkus Jul 22 '16 at 11:40