33

Vector's new method data() provides a const and non-const version.
However string's data() method only provides a const version.

I think they changed the wording about std::string so that the chars are now required to be contiguous (like std::vector).

Was std::string::data just missed? Or is the a good reason to only allow const access to a string's underlying characters?

note: std::vector::data has another nice feature, it's not undefined behavior to call data() on an empty vector. Whereas &vec.front() is undefined behavior if it's empty.

deft_code
  • 57,255
  • 29
  • 141
  • 224
  • 1
    I didn't knew `std::vector::data` returns null when the vector is empty. Why is that a nice feature? – R. Martinho Fernandes Sep 22 '11 at 17:16
  • 2
    personally I prefer to use 'empty' to check if a string or vector is empty, but that is just me. – AndersK Sep 22 '11 at 17:19
  • 2
    @R.MartinhoFernandes As you can easily supply the vector data to a function taking a pointer and coping with null pointers itself, without checking for emptiness yourself. Not an important feature, but a nice one. – Christian Rau Sep 22 '11 at 17:21
  • @ChristianRau Wouldn't such a function take a size parameter? – R. Martinho Fernandes Sep 22 '11 at 17:22
  • @R.MartinhoFernandes: Oops, that's wrong. It's implementation defined what it returns, whereas `std::string::data` is undefined when it's empty. – deft_code Sep 22 '11 at 17:24
  • 2
    Anyways, the point is moot. `std::vector::data` is not spec'd to return NULL. – R. Martinho Fernandes Sep 22 '11 at 17:26
  • 4
    @Anders: `f(v.empty() ? NULL : &v.front())` is quite a mouthful, though, compared to `f(v.data())`. – sbi Sep 22 '11 at 17:28
  • "_std::string::data is undefined when it's empty_" What? – curiousguy Oct 01 '11 at 16:21
  • @curiousguy, if `std::string::empty` returns true, then calling `std::string::data` evokes undefined behavior. – deft_code Feb 06 '12 at 18:31
  • @deft_code I understand what you wrote. I was asking where you got the bizarre idea that "_std::string::data is undefined when it's empty_". – curiousguy Feb 18 '12 at 04:06
  • @curiousguy, I don't know, I was wrong. I just read that section again (§24.2.4.3-1). The exact value is undefined but the behavior is well defined, `[data(),data()+size())` must be a well defined range, but when `size()` is zero `data()` can be anything and be a well defined empty range. – deft_code Feb 18 '12 at 23:42
  • [For the curious, I've reported this to the standards committee](http://cplusplus.github.io/LWG/lwg-active.html#2391). – Cornstalks Jun 03 '14 at 14:40

4 Answers4

31

In C++98/03 there was good reason to not have a non-const data() due to the fact that string was often implemented as COW. A non-const data() would have required a copy to be made if the refcount was greater than 1. While possible, this was not seen as desirable in C++98/03.

In Oct. 2005 the committee voted in LWG 464 which added the const and non-const data() to vector, and added const and non-const at() to map. At that time, string had not been changed so as to outlaw COW. But later, by C++11, a COW string is no longer conforming. The string spec was also tightened up in C++11 such that it is required to be contiguous, and there's always a terminating null exposed by operator[](size()). In C++03, the terminating null was only guaranteed by the const overload of operator[].

So in short a non-const data() looks a lot more reasonable for a C++11 string. To the best of my knowledge, it was never proposed.

Update

charT* data() noexcept;

was added basic_string in the C++1z working draft N4582 by David Sankel's P0272R1 at the Jacksonville meeting in Feb. 2016.

Nice job David!

Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
  • 3
    So it looks like the missing non-const data method was just forgotten. How does one file a Defect Report against the standard? – deft_code Sep 22 '11 at 18:53
  • @deft_code Indeed seems so. Another oversight may be the lack of data() in std::initializer_list, which has size() and the iterators begin/end, but not data() for template genericity with vector and string (since using &(*begin()) means deferencing a potentially invalid iterator if empty container). – Dwayne Robinson Dec 06 '13 at 03:31
  • 1
    In case anyone from the future finds this useful, I've asked about this on [std-discussion](https://groups.google.com/a/isocpp.org/d/topic/std-discussion/ll9HuEML6zo/discussion) and will submit a defect report (unless I'm told not to). Also, an alternative link for submitting issues is [here](http://isocpp.org/std/submit-a-library-issue) (it's not a massive page like the other link). Depending on how this goes, I may bring up `std::initializer_list` too. – Cornstalks Feb 12 '14 at 03:33
  • @Cornstalks I'd like to know how it turned out, can you share some follow-up? – Erbureth Jul 17 '14 at 14:40
  • 1
    In C++14 (N3937) there is no change. string still has only a const `data()`. – Howard Hinnant Jul 17 '14 at 14:46
  • @Erbureth: [it's been recorded as issue 2391](http://cplusplus.github.io/LWG/lwg-active.html#2391). Beyond that, not much has happened, unfortunately (but also understandably, as the committee obviously has a very long list of issues to review). – Cornstalks Jul 17 '14 at 16:23
  • @Cornstalks Maybe they will incorporate it into C++17, along with contiguous iterators and similar perks – Erbureth Jul 17 '14 at 19:28
2

Historically, the string data has not been const because it would prevent several common optimizations, like copy-on-write (COW). This is now, IIANM, far less common, because it behaves badly with multithreaded programs.

BTW, yes they are now required to be contiguous:

[string.require].5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

Another reason might be to avoid code such as:

std::string ret;
strcpy(ret.data(), "whatthe...");

Or any other function that returns a preallocated char array.

rodrigo
  • 94,151
  • 12
  • 143
  • 190
  • 1
    How does this answer the question? Given that `std::basic_string<>`'s storage is now required to be contiguous, why wasn't a non-const overload of `std::basic_string<>::data()` added? – ildjarn Sep 22 '11 at 17:30
  • @rodrigo, sorry I wasn't clear. I meant, is there a good reason to only allow const access to the character data _via the data() method_. Thanks for the contiguous storage reference. – deft_code Sep 22 '11 at 17:43
  • I don't see how a non-const `data()` is any worse for COW than, say, the existing possibility get a non-const-qualified pointer from non-const `&front()`. In both cases the implementation would have to perform the copy before returning the address -- contiguous or not, the problem for COW is that if the user can modify the element through a pointer, then the referand must be an element of that instance alone. – Steve Jessop Sep 22 '11 at 18:00
  • 1
    @SteveJessop AFAIK, basic_string doesn't have a `front()` member, so the only way to access to pointers to characters are &*iterators, `&s[x]`, `c_str()` and now `data()`. I guess that it has to be with use-cases. That is, if you use `operator[]`, you are likely to modify the string, but if you use `c_str()/data()` you are likely not to modify it. YMMV. – rodrigo Sep 22 '11 at 18:27
  • @rodrigo: it doesn't have `front()` in C++03, it does in C++11. I think that COW was always a fudge, since `operator[]` is supposed to return `basic_string::reference`, not some proxy type whose `operator=` does the copy. So to be conforming, it really needs to be copy-on-calling-a-non-const-function rather than waiting for the actual write. But I forget the details, it's a very hair-splitting argument. – Steve Jessop Sep 22 '11 at 18:30
  • @SteveJessop I agree. The only sensible way to support the COW is to make a difference between mutable and non-mutable strings, that is two different classes. And that's not going to happen to C++ right now – rodrigo Sep 22 '11 at 19:10
  • Trying to design a specification that allowed COW on a type that is also a Container (and really a variant of std::vector), leaking references to the stored objects, was really a pain, and caused major ugliness, notably the fact that the first call to non-`const` `begin()` invalidates `const_iterator`s obtained previously via `const` `begin()`. This never happens with other containers. There is a fundamental mismatch between "STL conformance" and COW. – curiousguy Oct 02 '11 at 02:19
  • 1
    I like the term "copy on fright" to describe the "copy on write and non-const blah blah blah". The string gets copied on write, or if you just frighten it by taking a reference or shouting "Boo!" at it. (I first heard that term used by Andy Sawyer). – Jonathan Wakely Dec 08 '15 at 16:48
1

Although I'm not that well-versed in the standard, it might be due to the fact that std::string doesn't need to contain null-terminated data, but it can and it doesn't need to contain an explicit length field, but it can. So changing the undelying data and e.g. adding a '\0' in the middle might get the strings length field out of sync with the actual char data and thus leave the object in an invalid state.

Christian Rau
  • 45,360
  • 10
  • 108
  • 185
  • The spec says that `data()` and `c_str()` _"shall not alter any of the values stored in the character array."_ I thing that means they can't add a `'\0'`, but maybe that doesn't count because the `'\0'` would be outside the range [0,`size()`). – deft_code Sep 22 '11 at 17:36
  • 1
    No, you *can* have embedded '\0's in a std::string, and they should work just like any other character. Don't try to `printf()` them though. – rodrigo Sep 22 '11 at 18:31
  • See my answer for clarification about `std::string` (not enough room here) – curiousguy Oct 02 '11 at 02:39
0

@Christian Rau

From the time the original Plauger (around 1995 I think) string class was STL-ized by the committee (turned into a Sequence, templatified), std::string has always been std::vector plus string-related stuff (conversion from/to 0-terminated, concatenation, ...), plus some oddities, like COW that's actually "Copy on Write and on non-const begin()/end()/operator[]".

But ultimately a std::string is really a std::vector under another name, with a slightly different focus and intent. So:

  • just like std::vector, std::string has either a size data member or both start and end data members;
  • just like std::vector, std::string does not care about the value of its elements, embedded NUL or others.

std::string is not a C string with syntax sugar, utility functions and some encapsulation, just like std::vector<T> is not T[] with syntax sugar, utility functions and some encapsulation.

Gabe Sechan
  • 90,003
  • 9
  • 87
  • 127
curiousguy
  • 8,038
  • 2
  • 40
  • 58